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In this paper, the maximal nonlinear conditional correlation of 
two random vectors X and Y given another random vector Z, de- 
0^ ■ noted by pi(X,Y\Z), is defined as a measure of conditional associ- 

ation, which satisfies certain desirable properties. When Z is con- 
tinuous, a test for testing the conditional independence of X and Y 

■ given Z is constructed based on the estimator of a weighted average 

of the form ^2k=l fz(zk)Pi(X,Y\Z = zt), where fz is the probabil- 
ity density function of Z and the z^s are some points in the range 

' of Z. Under some conditions, it is shown that the test statistic is 

, asymptotically normal under conditional independence, and the test 

is consistent. 

1. Introduction. In this paper, the problem of interest is testing the 

■ conditional independence between two random vectors X and Y given a 
third random vector Z. The study of the problem of testing conditional 

\ independence has a long history. However, there are relatively few results 

£f~) ■ on nonparametric tests when the vectors X, Y and Z are continuous. Some 

^5 . examples of such tests can be found in Su and White [12, 13], where they 

also proposed conditional independence tests based on a weighted Hellinger 
distance between the conditional densities or the difference between the 
conditional characteristic functions. 

As mentioned in Daudin [2], X and Y are conditionally independent 
^ ■ given Z means that for every f(X,Z) and g{Y,Z) such that Ef 2 (X,Z) 



Received September 2009; revised November 2009. 

X A part of this work is done when the author was visiting the Institute of Statistical 
Science at Academia Sinica in Taiwan. 

2 Supported by the National Science Council of Taiwan through Grants NSC 95-2119- 
M-004-001- and NSC 97-21 18-M-004-001-. 

' ! Supported in part by Center for Service Innovation at National Chengchi University. 

AMS 2000 subject classifications. Primary 62H20; secondary 62H15, 62G10. 

Key words and phrases. Measure of association, measure of conditional association, 
conditional independence test. 



This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Statistics, 
2010, Vol. 38, No. 4, 2047-2091. This reprint differs from the original in 
pagination and typographic detail. 

1 



2 



T.-M. HUANG 



and Eg 2 (Y, Z) are finite 

E(f(X, Z)g(Y, Z) | Z) = E(f(X, Z) \ Z)E(g(Y, Z) \ Z) . 

Thus, the problem of testing conditional independence, as the problem of 
testing unconditional independence, is invariant when one-to-one transforms 
are applied to the marginals X and Y, respectively. Various authors have 
taken this invariant property into consideration when constructing condi- 
tional or unconditional independence tests. For example, Su and White [13] 
used Hellinger distance in their test statistic for testing conditional indepen- 
dence, so that the test statistic is invariant. Dauxois and Nkiet [3] used mea- 
sures of association to construct independence tests, and the measures are 
invariant under the above transforms. In this paper, to take invariance into 
account, the proposed test is based on the maximal nonlinear conditional 
correlation, which can be viewed as a measure of conditional association and 
satisfies the above invariance property. 

To choose a reasonable measure of conditional association between X and 
Y, the following properties are considered. 

(PI) The measure can be defined for all types of random vectors, including 

both discrete and continuous ones. 
(P2) The measure is symmetric, that is, it remains the same when {X, Y) 

is replaced by (Y, X). 
(P3) The measure is invariant when one-to-one transforms are applied to X 

and Y, respectively. 
(P4) The measure is between and 1. 

(P5) The measure is if and only if conditional independence holds. 

The above properties are adapted from some of the conditions for a good 
measure of association proposed by Renyi [9]. In [9], the conditional inde- 
pendence in (P5) is replaced by the unconditional independence. Note that 
the symmetric property (P2) is not always required. For instance, Hsing et 
al. [6] proposed to use the coefficient of intrinsic dependence as a measure 
of dependence, which does not satisfy (P2). Here, (P2) is considered. 

Many measures of conditional association satisfying (P1)-(P5) can be 
constructed. Dauxois and Nkiet [4] showed that a class of measures of as- 
sociation between two Hilbertian subspaces can be obtained by properly 
combining the canonical coefficients of the canonical analysis (CA) between 
the spaces. In particular, take the two subspaces to be H\ = {f(X,Z) — 
E(f(X,Z)\Z):Ef(X,Z) < oo} and H 2 = {g(Y,Z) - E(g(Y, Z)\Z): 
Eg 2 (Y,Z) < oo}, then a class of measures of conditional association be- 
tween X and Y given Z satisfying properties (P1)-(P5) can be obtained 
using the canonical coefficients. Denote the canonical coefficients (arranged 
in descending order) by pi(X, Y\Z):i = 1,2,.... When X and Y are not 
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functions of Z, the largest canonical coefficient pi(X,Y\Z) is the maximal 
partial correlation defined by Romanovic [10], which is 

sup corr(/(X, Z) - E(f(X, Z)\Z), g(Y, Z) - E(g(Y, Z)\Z)). 
f,g 

Another approach to construct measures of conditional association is to 
modify the CA between the spaces H x = {f(X) - Ef(X):Ef 2 (X) < 00} 
and H2 = {g(Y) — Eg(Y) :Eg 2 {Y) < 00} to obtain a conditional version of 
it. That is, to find pairs of functions (fi,gi) : i = 0, 1, . . . , such that for each 
h {fi->9i) maximizes E(f(X,Z)g(Y,Z)\Z) subject to 

(1.1) E(f 2 (X,Z)\Z)I (0 ^ ) (E(f 2 (X,Z)\Z)) = I {0 ^ ) (E(f 2 (X,Z)\Z)), 

(1.2) E{g 2 {Y,Z)\Z)I {0 ^ ) {E{g\Y,Z)\Z)) = I {0 ^ ) {E{g\Y,Z)\Z)) 
and 

E(f(X, Z)fj(X, Z)\Z) = = E(g(Y, Z) 9j {Y, Z)\Z) for < j < i. 

Here, Ia denotes the indicator function on a set A, that is, Ia( x ) = 1 if 
x G A and Ia(x) = 0, otherwise. If the above (/j,<7j)'s exist, then one can 
define Pi {X, Y\Z) = E{fi(X, Z) gi (Y, Z)\Z) for each i and the Pi (X,Y\Zy S 
can serve as a conditional version of canonical coefficients. A measure of con- 
ditional association satisfying (P1)-(P5) can be obtained by taking a proper 
combination of the pi(X,Y\Zys, following the approach in [4]. Examples of 
such combinations include p±(X,Y\Z) and 1 — exp(— p 2 (X, Y\Z)). The 
measure of conditional association used in this paper is p\(X,Y\Z), which 
will be called the maximal nonlinear conditional correlation of two random 
vectors X and Y given Z from now on. 

In the above definition of pi(X,Y\Z) , s, it is assumed that the (/i,<?i)'s 
exist. However, it is not clear what conditions can guarantee the existence 
of the (/i,<?i)'s. To avoid the problem of finding such conditions, a more 
general definition for p\{X,Y\Z) is given in Section 2. To construct a test 
based on pi(X,Y\Z), it is assumed that Z has a Lebesgue probability den- 
sity function fz- An estimator of ^ fc fz{zk)p\(X,Y\Z = is then used as 
the test statistic, where the z^'s are some points in the range of Z. To study 
the asymptotic behavior of the test statistic under the hypothesis that X 
and Y are conditionally independent given Z, we follow the approach in [3] 
for finding the asymptotic distribution of a statistic for testing the inde- 
pendence between X and Y, which is based on estimators of the canonical 
coefficients from the CA of H\ and H2. To make the approach work for the 
conditional case, some strong approximation results for kernel estimators of 
certain conditional expectations are also established. 

This paper is organized as follows. The new definition of pi(X,Y\Z) is 
given in Section 2. Section 3 deals with the estimation of p\(X,Y\Z = z) 
and test construction. An example is in Section 4 and proofs are given in 
Section 7. 
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2. Maximal nonlinear conditional correlation. In this section, a more 
general definition of the maximal nonlinear conditional correlation pi(X, Y\Z) 
will be given. Note that in the definition of pi(X,Y\Zys in Section 1, one 
can take fo(X,Z) = 1 = go(Y,Z), which gives that po(X,Y\Z) = 1, and 
then pi(X,Y\Z) can be defined as E(fi(X,Z)gi(Y,Z)\Z) if there exists 
(/lifli) ^ 5*0 such that 

E(f(X,Z)g(Y,Z)\Z) < E(f 1 (X,Z)g 1 {Y,Z)\Z) for every (f,g) G 5 0j 

where So is the collection of pairs of functions (/, g)'s that satisfy (1.1), (1-2) 
and E(f(X,Z)\Z) = = E(g(Y, Z)\Z). Without assuming the existence of 
(/i,c/i), it is reasonable to define pi(X,Y\Z) as 

(2.1) sup E(f(X,Z)g(Y,Z)\Z), 

(f,9)es 

if the supremum can be defined. 

The above approach can be considered as a "pointwise" approach. Indeed, 
when Z takes values in a countable set Z, for each z £ Z, one may define 
pi(X, Y\Z = z) as 

(2.2) sup E(f(X,z)g(Y,z)\Z = z), 
(f,g)eS 

then the p±(X,Y\Z) defined using (2.2) is a measurable function and can 
serve as the supremum in (2.1). However, if Z is uncountable, then it is not 
clear whether the p\(X,Y\Z) defined using (2.2) is measurable. Therefore, 
we use the following fact to define the supremum in (2.1) so that it is well 
defined and is a measurable function. 

Fact 1. There exists a sequence {(a n ,(3 n )} in Sq such that: 

(i) The sequence {E(a n (X, Z)(3 n (Y, Z)\Z)} is nondecreasing, and 

(ii) for every (f,g) G S , 

E(f(X,Z)g(Y,Z)\Z)< lim E(a n (X, Z)j3 n (Y, Z)\Z). 

Furthermore, if (i) and (ii) hold for {(a n , j3 n )} = {(a„,i, /3 n ,i)} or {(a nj2 , P n ,2)}> 
where {(a n> i, /3 n ,i)} and {{a n ^, Pn,2)} are sequences in So, then 

(2.3) lim E(a nA (X,Z)f3 nA (Y,Z)\Z)= lim E{a nj2 {X, Z)P nj2 {Y, Z)\Z). 

n— >oo n— >oo 

For the sake of brevity, from now on, some functions of (X, Z) or (Y, Z) 
may be expressed without the arguments (X, Z) or ( Y, Z) . For distinguishing 
purpose, functions of (X, Z) may have names starting with only a or /, and 
functions of (Y, Z) may have names starting with only f3 or g. 



TESTING CONDITIONAL INDEPENDENCE 5 

Proof for Fact 1. We will first establish (2.3) if (i) and (ii) hold for 
{(a n ,p n )} = {(a n ,i> or {( a n,2,/3n,2)}- Note that for each n, from (ii), 

we have that 

E{a n ^Pn,2\Z) < lim E{a n l p n \\Z) 

n— >oo 

and 

E(a ntl /3 n>1 \Z) < lim E(a n rf n ?\Z). 

n— >oo 

Take the limits in these two inequalities as n— > oo, and we have (2.3). 

It remains to find a sequence {(a n ,/3 n )} in So that satisfies (i) and (ii). Let 
{(a n fi, f3 n ,o)} be a sequence in So so that the sequence {E(a n: of3n,o)} is non- 
decreasing and converges to sup^ g ^ eSo E(fg). We will construct {(a n ,/3 n )} 
using {(a n fi, P n ,o)} as follows. For n = 1, define (ai,/3i) = (a^o, /3i,o)- For 
n > 2, define 

(a n (X,Z),/3 n (Y,Z)) 

(a n ,o(X,Z),/3 n , (Y,Z)), if E(a ni0 /3 n ,o|Z) > £(a n _i/3 n _i|Z); 
(a n _i(X, Z),(3 n -i(Y,Z)), otherwise. 

Then {(a n ,/3 n )} is a sequence in So that satisfies (i), and the sequence 
{Ea n f3 n } converges to sup( /g)e5o E(fg) since £(a n /3 n |Z) > E(a nt0 p n)0 \Z). 
To see that {(a n5 /9n)} also satisfies (ii), for (a,/3) in So, define 



(ct n , /3 n ) 



(a,/3), if £(a/3|Z)> lim £(a n /3 n |Z); 

n— >-oo 

( a n , (3 n ) , otherwise . 
Then {(a£,/3^)} is a sequence in So such that 

(2.4) lim E(al/3*\Z) = max{ E{a/3\Z), lim E(a n /3 n \Z)\. 

From the monotone convergence theorem, we have 

(2.5) E lim E(a*MZ) = lim £«#;) 

and 

(2.6) £ lim £(a nj 9 n |Z) = lim E(a n /3 n ), 

n— >oo n— >-oo 

so (2.4) implies that 

sup E{fg)> lim E(a* n P* n ) > lim E(a n P n ) = sup £(/#), 

which gives 

(2.7) lim £«/?;) = lim E(a n /3 n ). 
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If E(a/3\Z) > lim n ^ 00 E(a n /3 n \Z) with positive probability, then (2.4), (2.5) 
and (2.6) together implies that lim™-^ E(a* n /3*) > linin^oo E(a n f3 n ), which 
contradicts (2.7). Thus, (ii) holds. The proof of Fact 1 is complete. □ 

With Fact 1, the maximal nonlinear conditional correlation p\(X,Y\Z) 
can be redefined as follows. 



Definition 1. Pl (X,Y\Z) = sup (/>fl)eSo E(f(X,Z)g{Y,Z)\Z), which is 
defined as lim ri _ 5 . 00 E(a n (X, Z)f3 n (Y, Z)\Z), where {(a n ,(3 n )} is a sequence 
in So that satisfies (i) and (ii) in Fact 1. 



Below are some remarks for the pi(X,Y\Z). 

1. If there exists (/i,5i) in So such that E(figi\Z) > E(fg\Z) for all (f,g) G 
So, then p\{X,Y\Z) = E{f\g\\Z) using Definition 1. To see this, let 
{(a n ,Pn)} be a sequence in So that satisfies (i) and (ii) in Fact 1. Then 
Pi(X,Y\Z) = lin Woo E(a n (3 n \Z), so E(f l9l \Z) < pi(X,Y\Z) by (ii). Also, 
E(f l9l \Z) > E(a n (3 n \Z) for every n, so E(f m \Z) > Pl (X,Y\Z). There- 
fore, p±(X,Y\Z) = E(figi\Z) and Definition 1 can be viewed as a gener- 
alized version of the definition of pi(X,Y\Z) given in Section 1. 

2. Pl (X,Y\Z) satisfies properties (P1)-(P5). 

3. When X is a function of Y and Z or Y is a function of X and Z, it is 
not necessary that pi(X, Y\Z) = 1. For instance, suppose that X and Z 
are independent standard normal random variables and Y = XI^q ^(Z), 
then Pl (X,Y\Z)=I^ oo) (Z). 

4. Let pi(X, Y) be the largest canonical coefficient from the CA between 
Ht = {f(X) - Ef(X) : Ef 2 (X) < oo} and H 2 = {g(Y) - Eg(Y) : Eg 2 {Y) < 
oo}. Then p±(X, Y\Z) = p±(X,Y) if (X, Y) and Z are independent. 

5. Let pi(X, Y) be as defined in item 4. It is stated in [3] that when the 
joint distribution of X and Y is bivariate normal 



N 



oJ'U i 



Pi(X, Y) = \p\. This result implies that, when the joint distribution for 
X, Y and Z is multivariate normal and X and Y are both univariate, 



Pi(X,Y\Z) 



E((X - E{X\Z))(Y - E{Y\Z))\Z) 



(E(X - E(X\Z)) 2 \Zy/ 2 (E(Y - E(Y\Z)) 2 \Zy/ 2 
E(X - E(X\Z))(Y - E(Y\Z)) 



(E(X - E(X\Z)) 2 y/ 2 (E(Y - E{Y\Z)) 2 y/ 2 

which also equals the absolute value of the usual partial correlation coef- 
ficient. 
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3. A test of conditional independence. Testing conditional independence 
is equivalent to testing Ho : p±(X,Y\Z) = 0, which involves testing Hq^ z : 
pi(X,Y\Z = z) = for different z's in the range of Z . Let Z be the range of 
Z . In this section, an estimator p{z) is proposed for estimating pi(X, Y\Z = 
z) for each z£Z, and for distinct points z\,..., z nz in Z, the asymptotic 
joint distribution of p(z\), . . . , p{z nz ) under Hq is derived to construct a test 
for testing Hq. 

3.1. Estimation of pi(X,Y\Z = z). To estimate 

Pl (X,Y\Z)= sup E(fg\Z) 

(f,g)es 

for (f,g) G So, / an d g are approximated using basis functions. Suppose 
that there exist Ai, A2 and A3: subsets of the set of all positive integers and 
three sets of functions {(f> Pt i : 1 < i < p,p € Ai}, {ip q j ■ 1 < j < q, q € A2} and 
{^r,fc : 1 < & < r, &; E A3} such that for a(X, Z) and f3(Y, Z) with finite second 
moments, 

(3.1) lim inf e(oi(X,Z)- V a(%k)<j) v AX)e rk (Z)\ = 

p,r->oo a (i,fc) \ ^-^ ' ' / 

±<i<PA<k<r 



and 



(3.2) lim Me(/3(Y,Z)- V &0',^wWr,fc(^)V = 0. 

Also, suppose that for each (p,q), there exist coefficients a Pi o/s and 6 g ,oj's 
such that 

( 3 - 3 ) X] a P,0,i^P,i( x ) = 1 = X Ko,j^q,j(y) 

l<i<p l^i^Q 

for every x in the range of X and every y in the range of Y. 

Let Si be the collection of all (/,<?) 's with finite second moments and let 
Si, p ,g be the collection of all (f,g)'s in Si such that f(X,Z) = 
YT%=i a p,i{ z )^pA x ) for some %>,i(Z)'s, and g(Y,Z) = Y!j=i\j( z )' l l ) q,j( Y ) 
for some 6 9J -(Z)'s. Then (3.1) and (3.2) together imply that Si can be ap- 
proximated by Si )P) g for large p and q. Since So C Si, So can be approxi- 
mated by Si tPt q as well. With the additional condition (3.3), So can be easily 
approximated using the subspace So iPiq = So n Si )P , q . Note that (3.1), (3.2) 
and (3.3) hold for certain basis functions, for example, the tensor product 
splines in [11]. 

Assuming (3.1), (3.2) and (3.3), it is reasonable to define 

sup E(fg\Z) 

{f,g)&S ,p, q 
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and use it to approximate p\(X,Y\Z). To define ^P(f, g )eS E(fg\Z), one 
may follow the same approach for defining sup^ g ^ So E(fg\Z), or simply 
note that there exists (fi,gi) G 5o,p,g such that 

(3.4) E(f l9l | Z) > E(fg\Z) for all (/,<?) G S 0)M 

and define sup (/ 9)gSo p ^ E(fg\Z) = E(f x gi\Z). The pair can be ob- 

tained as follows. Let 

E^(Z) = (^(^(X)</) P)i (Z)|Z) - J B(^(X)|Z) J B(^ J (X)|Z)) pxp , 

= (^(^,i(y)^j(y)|z) - J E(^(y)|z) J E(^, i (y)|z)) gX9 

and 

ZMmW = {E^ P ,m%A Y )\ Z ) - E((b p ,(X)\Z)E^ qJ (Y)\Z)) pxq . 
Consider the following two cases: 

(i) Ti^p(Z) and T,^„(Z) are not zero matrices, and 

(ii) at least one of S^, p (Z) and E^^Z) is a zero matrix. 

In case (i), let a\ = (ai t i(Z), . . .,ai, p (Z)) T and 61 = (b^i(Z), . . . ,b ltq (Z)) T 
be such that (ai,b\) is the pair of (a, b) that maximizes 

subject to 

a T T l(t) , p (Z)a = 1 = b T T,^ jq (Z)b, 

and then take 

p 

/ 1 (X,Z) = ^a M (Z)(^(X)- J B(^(X)|Z)) 
i=l 

and 

<? 

ff i(y,z) = ^6 1)i (z)(^, i (y) - e{^{y)\z)). 

i=i 

In case (ii), take fi(X, Z) = = g^Y, Z). Then G S , p , q and (3.4) 

holds. Denote sup {Lg)eSopq E(fg\Z) by p M (Z). 

The following fact states that pi(X, Y\Z) can be reasonably approximated 
by p Pt g(Z) if p and q are large. 

Fact 2. Suppose that (3.1), (3.2) and (3.3) hold and {p n } and {q n } are 
sequences of positive integers that tend to 00 as n — > 00 . Then 

lim E(\ Pl (X,Y\Z)-p Pn!qn (Z)\)=0. 
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Proof. Since pi(X,Y\Z) > p Pn ,q n {Z) for every n, Fact 2 holds if for 
every e > 0, there exists Nq such that for n > Nq, 

(3-5) p 1 (X,Y\Z)<p Pn)9n (Z) + A 1 

for some Ai such that J5|Ai| < e. To find such a Ai, we will first look 
for a pair (f m ,g m ) £ Sq such that E(f m g m \Z) & pi(X,Y\Z), and then find 
(/*,£*) e 5 0iPn , gn such that {f*,g* n ) fa (f m ,g m ). Take 

(3.6) A x = £(/ m <? m |Z) - E(f*g* n \Z) + pi(X,F|Z) - E(f m g m \Z), 

then (3.5) holds and £?|Ai| can be made small if m and n are large enough. 

To find {f m ,g m ) G S such that E(f m g m \Z) « pi(X,y|Z), let {(/„, <?«)}~=i 
be a sequence in So such that {_E(/ n g n |Z)} is an increasing sequence and 
lim^oo E(f n g n \Z) = Pl (X,Y\Z). Let A 2 , n = pi(X,Y\Z) - E(f n g n \Z), then 
lim^^oo E\A2 n \ = 0, which implies that for every 8 > 0, there exists m such 
that 

(3.7) E\A 2>m \<6. 

To find (/*,#*) £ So )P n,g» such that (/*,£*) « (f m ,9m), note that it follows 
from (3.1) and (3.2) that for n > Nq, there exists some (/ n ,i, <?n,i) G Si jPn>(?n 
such that 



(3.8) yjE(J m -f ntl f<S and ^£( 5m - <? n ,i) 2 < 5. 

Let / n , 2 (X, Z) = f nA (X, Z) - E(f n>1 \Z), g n , 2 (Y, Z) = g nA (Y, Z) - E(g nA \Z), 

r n (X,Z) = J^dMLl (0iOo) (E(fl 2 \Z)) 



g* n (Y,Z) = Y {Y ' Z) ho,oo) (E{gl t2 1 Z)), 
l E{gl 2 \Z) 



and 



then it follows from (3.3) that (fn,9n) e So jPnj q n - To see that (fn,9n) ~ 
(fm,9m), let A 3 = f m - f* and A 4 = g m - g*, then it can be shown that 

(3.9) EAl < 16S 2 + 86 
and 

(3.10) EA\ < 16<5 2 + 8(5. 

Below we will verify (3.9) only since the verification for (3.10) is similar. 
Write A 3 = f m - / n>2 + / n>2 - /*, then by (3.8), 

(3.11) E(f m -f n , 2 ) 2 <45 2 
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since E(f m - f n>2 ) 2 < 2(E(f m - f nA ) 2 + E(f nA - /„, 2 ) 2 ) and (/ n>1 - / n , 2 ) 2 = 

{E{(f m ~ fn,l)\Z)) 2 < E((f m - f n ,l) 2 \Z). Also, 

E((f* - f n>2 f\Z) = (1 - ^/E(fl 2 \Z)) 2 I {0tOo) (E(fl 2 \Z)) 
<\l-E(fl 2 \Z)\ 

= \E((fm ~ fn,2?\Z) ~ 2E(f m (f m - f n , 2 )\Z)\ 



Therefore, (3.9) follows from (3.11), (3.12) and the inequality EA 2 < 2(E(f m - 
fn,2) 2 + E(f ni2 -f*f). 

Finally, the A a in (3.6) is E(f*A A \Z) + E(g* A 3 \Z) + £(A 3 A 4 |Z) + A 2>m , 
so it follows from (3.9), (3.10), (3.7) and the Cauchy inequality that 



For e > 0, one can choose 5 so that 3\/l6(5 2 + 85 + 5 < e, then £?|Ai| < e as 
required. The proof of Fact 2 is complete. □ 

Based on Fact 2, it is reasonable to estimate p\ (X, Y\Z) using an estimator 
for p Pjq (Z), where p and q are large. To estimate p p>g (Z), the following 
assumption is made: 

(Al) There exists a version of the conditional distribution of (X,Y) given 
Z such that for every bounded function g(X,Y), E(g(X,Y)\Z) calcu- 
lated using that version is a continuous function of Z. 

From now on, we will use the version of conditional distribution in (Al) to 
obtain E(g(X,Y)\Z = z) for every bounded g and every z in the range of 
Z. It for each (p,q), l<i<p, 1 < j < q, \4> p ,i\ < 1 and \ip q> j\ < 1, then each 
element in T,^ tP (z), Yi^^ q (z) and T,^^ iPiq {z) is a continuous function of z, and 
Pp,q(z) is max a) (, a T E^^ g (z)6, where the maximum is taken over all vectors 
a and b such that 



< E((fm ~ fn,2?\Z) + 2^E((f m -f nt2 ) 2 \Z) 



SO 



(3.12) 




£?|Ai| < 3\/l65 2 + 85 + 5. 



a T T,^ p (z)a = 1 = b T T, 1 i !)q (z)b. 
To estimate p Ptq (z), we consider the estimator 

Pp,q( z ) =maxa T S^ iPi g(z)6, 



a,b 



where the maximum is taken over all vectors a and b such that 



a T S^ p (z)a = 1 = 6 T S^ i9 (z)6, 
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and £^(2), ^^^^(z) and £^ g (z) are obtained by replacing the conditional 
expectations in £^(2), S^,^ !P) g(z) and £^(2) by their kernel estimators. 
Specifically, each element in £^(2), £^^(2) and £^ g (z) is of the form 
E(UV\Z = 2) - (E(U\Z = z))(E(V\Z = z)j, where U and V are functions 
of X or Y, so each of E(UV\Z = z), E(U\Z = z) and E(V\Z = z) is of the 
form E(g(X,Y)\Z = z), which is estimated by 

(3.13) E(g(X, Y)\Z = z) ^ ^^ff^^ , 

where kh{z) = h~ d ko(z/h) and ko is a kernel function on R d satisfying certain 
conditions which will be specified later. For each 2 £ to make p p , q {z) a 
reasonable estimator for pi(X, Y\Z = 2), we will take p = p n , Q = Qn and h = 
h n , where p n — > 00, q n — > 00 and /i n — > as n — )• 00. The estimator p Pn>gn (z) 
will be abbreviated as p(z) for each 2 € i?. 

The estimator ^0(2) can be expressed in a different form that is easier to 
analyze. Let X* and Y* be random vectors of length p n and q n , respectively, 
such that given the data (X\,Yi, Z\), . . . , (X n , Y n , Z n ), 

(X^Y?) = (^ n ,i(X,), . . .^ Pn , Pn {Xi)^ qn ,i{Y,), . . .^ qn , qn {Yl)) 

with probability kh{z — Z n) / Y^i=\ kh(z — Z{) for 1 < t < n. Then £<^ jPj(? (2) = 
EX*Y, T -EX*EY, T , % tP {z) = EX*X?-EX*EXT and %, q {z) = EY^Yj - 
EY*EY?, where the expectations are conditional expectations given the 
data. Therefore, the estimator p(z) is the largest canonical coefficient from 
the centered canonical analysis between X* and Y . Note that it follows from 
(3.3) that 

(3.14) a ^ = l = 6^Y, 
where 

a n ,* = (« Pn , o,i' • ■ • > a Pn,o,p n ) T an d K,* = (bq n ,o,i, ■ ■ ■,bq n fi t q n ) T , 

so p(z) can also be obtained from the noncentered canonical analysis between 
X* and Y*. Let 

Vi,!(z) = (E^iW^WlZ = z)) pnXpn , 

ViM = (E(cf> Pnti (X)iP qn j(Y)\Z = z)) PnXqn , 

V 2 M = (E^ 9n 4Y)^ qnd (Y)\Z = z)) qnX9n and V 2>1 (z) = V^zf 

for 1 < i,j < 2, let Vij(z) be the estimator of Vij(z) obtained by replacing 
the conditional expectations in Vij(z) by their kernel estimators as in (3.13). 
Then V 1}1 (z) = EX*X?, V 1>2 (z) = EX*Y?, V 2j2 (z) = EY^Yj , so p{z) is the 
square root of the largest eigenvalue of the matrix 

Y 1 , 2 (2)Y 2 - 2 1 (2)Y 2 , 1 (2)Vi, 1 (2)- 1 - Yx^KX*- 
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Also, p Pn ,q n {z) is the square root of the largest eigenvalue of the matrix 

To simplify the above matrix expressions, some notation is introduced as 
follows. For a (p n + q n ) x (p n + q n ) matrix U, express U as 

Ui,j U 1>2 

U 2 ,l U 2 ,2 

where the dimension of U\ t \ isp n xp n . For 1 < i, j < 2, let gij be the mapping 
that maps U to Uij . For ap n Xl vector a and a (p n + q n ) x (p n + g n ) matrix 
t7, define 

g(U,a) = 9i,2(U)g 2 ,2(U)- 1 g 2tl (U)g ltl (U)- 1 - gi A (U)aa T ', 
if 92,2{U) and g\^{U) are invertible. Let 



no 

and 

V(z)- 



Vi,i{z) VxM 
V 2 ,i(z) V 2 , 2 (z) 



Vi,i(z) V 1)2 (z) 
V 2> i(z) V 2)2 (z) 



then p(z) is the square root of the largest eigenvalue of g(V(z), a n ,*) and 
p Pn ,q„(z) is the square root of the largest eigenvalue of g(V(z), a n ,*)- 

The matrix g^V^z), £J n ,*) can be replaced by a different matrix if basis 
change is performed. That is, suppose that 

<t> = {<t>Pn,^---Apn,p n ) T and ^ = (VW'---'VwJ T 

are replaced by 0* = Pi0 and V* = Qi^j respectively, and V(z) becomes 
V*(z) after such a change is made. Then p{z) is also the square root of 
the largest eigenvalue of the matrix g(V*(z),a*), where a* = (P{~ ) T a n ,* 
is a vector such that (a*) T <^>* = 1. To make the expression for g(V*(z),a*) 
simple, the matrices P\ and Q\ are chosen so that 

(3.15) 01 = 1 = ^1, 

gi t i(V*(z)) and g 2 , 2 (y*{z)) are identity matrices, and for 1 < i < p n and 
1 <j < Qn, 



(3.16) E{4>*{X)i>*{Y)\Z = z) = Stjy/K 

where <j>* and tpj denote the ith element in (f>* and the jth element in ip* , 
respectively, <5$, denotes the Kronecker symbol and the Aj's are the eigen- 
values of g(V*(z),a*). Note that (a*) T = (1,0, ... ,0) with the above choice 
of P\ and Q\. 
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3.2. Asymptotic properties and a test of conditional independence. In 
this section, we will give asymptotic properties of the estimators p(zk) ■ 1 < 
k < riz, where the z^s are distinct points in Z. First, we will establish the 
consistency of the estimators, which relies on the fact that for each k, the two 
matrices g(V*(zk), a*) and g(V*(zk), a*) are close, and their largest eigen- 
values are p 2 {zk) and p^ nqn (zk). The difference between g(V*(zk), a*) and 

g(V*(zk), a*) depends on the difference of V*(zk) and V*(z^), and the differ- 
ence between some conditional expectation E(g(X,Y, Z)\Z = z) and its ker- 
nel estimator E{g{X,Y,Z)\Z = z) = Yn=\ w aA z )9{ x i, Y hz)/Y^=\ w a,i( z )i 
where wq^(z) = ko(h~ l {z — Z{)). To make it easier to derive the asymptotic 

properties of E(g(X,Y,Z)\Z = z), some regularity conditions on the distri- 
bution of (X, Y, Z) are imposed as follows. 

(Rl) There exists a tr-finite measure p such that for every the condi- 

tional distribution of (X, Y) given Z = z has a p.d.f. f{-\z) with respect 
to p. Also, Z has a Lebesgue p.d.f. fz, and f(x,y\z) and fz(z) are 
twice differentiable with respect to z. 

(R2) There exists a function h on X x y such that 



sup max 

zez 



\f{x,y\z)\, 



max 

Ki<d 



max 

l<i,j<< 



ff 2 



dzi dzj 



f(x,y\z] 



< h(x,y) 



and f h(x, y) dfj,(x, y) < oo. 
(R3) There exist constants cq and c\ such that 



sup max I \fz(z)\, max 
zeZ V l<i<d 

and 1/ fz{z) < c\ for z G Z. 



d_ 

dz. 



fz(z) 



max 

l<*J<d 



o 2 



dz{ dzj 



fz(z) 



< c 



Note that (R2) implies condition (Al) in Section 3.1. For the kernel function 
ko, conditions (Kl) and (K2) are assumed. The notation || • || denotes the 
Euclidean norm for a vector or the Frobenius norm for a matrix. 

(Kl) ko > 0, sup u ko(u) < oo, J ko(u)du = 1, J uko(u)du = and a 1 = 

f ||u|| 2 A;o(ii) du < oo. 
(K2) There exists positive constants 72 and 73 that does not depend on d 

such that 



ko(a) < (72) 



-73INI 



for every a € R d . 



Remark. If ko is a product kernel of the form ko(zi,..., Zd) = fcoo(ziy 
k 00 (z d ), and 



^00(3^) < 72e 73X for every x 6 R, 
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then condition (K2) holds. 

Assume the above conditions, then it is possible to control the difference 
between V*(z k ) and V*(z k ) using the following result. 

Lemma 1. Suppose that conditions (R1)-(R3) and (K1)-(K2) hold. Sup- 
pose that f n ,i, ■ ■ ■ , fn,k„ are functions defined on X xy x Z, where X, y 
and Z are the ranges of X , Y and Z, respectively. Let fz he the p.d.f. of Z, 
f z (z) = (nh d n )- 1 Z7=i MKH* ~ Zi)) forzGZ and c K = l/J k 2 Q (s) ds. For 
z G Z, let Wi(z) = n~ 1 h~ d wo : i(z)/ fz(z) for 1 < i < n and 

W nJ (z) = ^nh d c K f z (z)(J^2 Mz)fn,j( X i> Y h *) j -E(fnA X , Y,z)\Z = z)^ 

for 1 < j <k n . Suppose that {h n }^ =1 and {e n }^ =1 are sequences of positive 
numbers such that 

C3,in~ a <h n < c 3i2 n" a 

for some positive constants c^ s i and C32 and + 4) < a < 1/d, and 
h n /s n = 0(n _/3 ) for some (3 > 0. Let 

(3.17) Z(e n ) = {z€Z:{z' eR d : \\z' - z\\ < e n } C Z} 
and suppose that z\, ... , z nz are points in Z{e n ) such that 

(3.18) \\zk — z k * || > h n for 1 < k,k* < nz and k^k* 
for large n and 

(3.19) max sup \f n ,j(x,y,z k )\ <C n for some C n >\. 
\<k<n z ( X) y)eXxy 

Suppose that k n nzC n = ©((Inn) 1 / 16 ). Then there exist Wn ij k and W n> 2 j,k '■ 1 — 
j < k n , 1 < k < nz such that the joint distribution of W n> ij t k + Wn,2j,fe 's 
is the same as the joint distribution of W n j(z k ) 's, Ylj=i Yl k =i 2 j k = 
Op(exp(— (Inn) 1 / 9 )), and W^ij^'s are jointly normal with EW n> ij t k = ^ 
and for l<j,£<k n and 1 < k, k* < nz 

Cov(f nJ (X, Y, z k ),f n>e (X, Y,z k )\Z = z k ), ifk = k*; 
0, otherwise. 

The proof of Lemma 1 is given in Section 7.1. 

The differences between V*(z k ) , s and V*(z k ) , s can be controlled by apply- 
ing Lemma 1 and taking the f n j ( X, Y, z) 's to be the functions ^(X)4>^,(X), 
(f>}{X)^* m {Y) and ^* m (Y)ij* ml (Y), where 1 < t < £' < p n and 1 < m < mf < q n . 
In such case, (3.19) holds under the following conditions. 
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(Bl) For each (p, q), \<fi Pt k\ < 1 an d IV^/ 1 < 1 for 1 < < p and 1 < £ < q. 
(B2) There exists {5 n }: a sequence of positive numbers such that for 1 < 

k < nz, the smallest eigenvalues of the matrices Vi^{z k ) and ^^(^fc) 

are greater than or equal to 5 n . 

Under the above conditions, the p(z k ys are consistent, as stated in Theorem 
3.1. 



Theorem 3.1. Suppose that (3.1), (3.2), (3.3), conditions (R1)-(R3), 
(K1)-(K2) and (B1)-(B2) hold. Suppose that {/i„}£°=i and {£n}£°=i are se- 
quences of positive numbers such that 

C3,in~ a <h n < c 3)2 n -Q 

for some positive constants c^i and C32 and l/(d + 4) < a < 1/d, and 
hn/tn = 0(n~P) for some /3 > 0. Suppose that z±, . . . , z nz are points in Z{e n ) 
[defined in (3.17)] such that (3.18) holds and 

(3.20) n z (p n + q n f max{l, 5~ l ( Pn + q n )} = CK(lnn) 1 / 16 ). 
Then 

(3.21) Y.^ 2 ^-Pl^M)f = P ({nhi)-\\nn) l l i ) 
k=i 

and 

/ (lnn) 5/16 



(3.22) lj2fz(z k )p\z k )-f^f z (z k )pl n<g Jz k )) =0 P (- 

\k=l k=l J ^ 



nh* 



The proof of Theorem 3.1 is given in Section 7.2. 

The next result deals with the asymptotic distribution of Ylk=l fz( z k)p 2 (zk) 
when X and Y are conditionally independent given Z. 



Theorem 3.2. Suppose that the conditions in Theorem 3.1 hold and 
X and Y are conditionally independent given Z . Then there exist random 
variables f k , p 2 {z k ) and X k : 1 < k < nz such that Ylk= 1 fkf> 2 (z k ) has the 
same distribution as Y^k=\ fz{ z k)P 2 { z k) and 

n z n z 

n^x;Ap 2 (^)-E^=°^( ex p(-°- 5 ( inn ) l/9 )( inn ) 3/32 )' 

k=l k=l 

where the \ k 's are independent and each X k has the same distribution as the 
largest eigenvalue of a matrix CC T , where C is a (p n — 1) x (q n — 1) matrix 
whose elements are i.i.d. N(0,1). 
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The proof of Theorem 3.2 is given in Section 7.3. The result in Theorem 3.2 
is similar to that in Lemma 7.2 in [3]. The difference is that the asymptotic 
result here is derived as the sample size n, p n and q n all tend to oo, while 
in [3], the result is derived as n tends to oo, but p n and q n are held fixed. 

Theorem 3.2 suggests the test that rejects the conditional independence 
hypothesis at approximate level a if 



I'Z 



(3.23) nhtcK fz{z k )p\z k ) > F~^ q (l - a), 



k=l 



where F nZtPt g is the cumulative distribution function of Ylk=x^k- 

One can estimate F~^ pq {l — a) in (3.23) using simulated data, but it 
is also possible to use a normal approximation. Since the A^'s are i.i.d., 
the central limit theorem suggests the asymptotic normality of Ylk=i ^ k 
and Ylk= l fz{zk)p 2 {zk)- The following corollary gives the conditions that 
guarantee the asymptotic normality of Y^k=i fz(zk)p 2 (zk)- 

Corollary 1. Suppose that the conditions in Theorem 3.1 hold 

3 3 

(3.24) lim PnQn 



oo ^(maxfft.gn)) 1 / 3 
and (i) or (ii) holds: 

(i) q n = h(p n ), where h is an increasing function such that lim^oo h(p)/p 
exists and is greater than or equal to 1. 

(ii) p n = h(q n ), where h is an increasing function such that lim (? _j. 00 h(q)/q 
exists and is greater than or equal to 1. 

Let fJ, Pn ,q n and q n be the mean and variance of the largest eigenvalue 
of the matrix CC T in Theorem 3.2, respectively, and let the \f.'s be as in 
Theorem 3.2, then 

(max(r?„, o,,)) 1 / 6 „ , , 
(3.25) i K - — — = 0(1) 



a 



Pn,q n 



and 



If X and Y are conditionally independent given Z , then 

f on 7 \ nh i C KElilfz(z k ) P 2 (z k ) - n Z p Pn ,q n V AUn 

(3.27) — - — ; >N{u,l) as n— >-oo. 

n zo 2 VnAn 
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The proof of Corollary 1 is given in Section 7.4. Corollary 1 gives the test 
that rejects the conditional independence hypothesis if 

nhjcR YTktx fz(z k )p 2 (z k ) - n z n pn)(ln > _ 



(3. 

Pn,q n 



where <3? is the cumulative distribution function for the standard normal 
distribution. Here, fJ> Pn! q n and cr 2 n(]n can be approximated by the sample 
mean and variance of a random sample from the distribution of the largest 
eigenvalue of the matrix CC T . 

To distinguish the two tests mentioned above, we will refer to the test 
with rejection region in (3.28) as test IN and the test with rejection region 
in (3.23) as test 1. Note that under the conditions in Corollary 1, test 1 
does not differ from test IN much since the rejection region for test 1 can 
be written as 



nh d n c K YTkti f z(z k )p 2 (z k )-n z p Pn 



>I + $~ 1 (1 



where 



(3.29) T= ' f HZ^I i_$-i(i_ Q ) = (i) 



n z ol 



by (3.26). Therefore, both tests 1 and IN are of asymptotic significance level 
a. Below we will discuss the consistency and asymptotic power of test IN 
only since the same properties of test 1 can be established similarly using 
(3.29). 

Suppose all the conditions in Theorem 3.1 hold, then test that IN is also 
consistent if the z k s are chosen in a way such that there exist a constant C3 > 
and a sequence {r]i !n }^ =1 such that 771^ > for every n, limn-^ rjx >n = 
and 

( 3 - 30 ) —^2fz(zk)p 2 Pn , qn (.z k ) - c z Ep 2 pn qn {Z) = op(rn tn ). 

nz k=i 

To see that test IN is consistent, note that < p PnAn < Eti(CC T ) and 
a p n ,q n — E(tr(CC T )) 2 , where CC T is as in Theorem 3.2. Therefore, p Pn , Qn = 
0{p n q n ) and = 0{p 2 n q 2 n ). Then it follows from (3.22), (3.30) and Fact 

2 that n~2 

Op(l]l,n) + <-o~rp„,q n * 

nh d n c K YT k =\ fz(zk)p 2 (z k ) - n z p Vri 



*z E n k ljz(z k )p 2 (z k )-c 3 Ep 2 (X,Y\Z) = P ( (In n) 5 / 32 /n z jrt? n ) + 
+ c 3 Ep 2 (Z)-c 3 Ep 2 (X,Y\Z) = o P (l), S o 



n z^p n , qn 
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/^(nhic K (c 3 Epj(X,Y\Z) + o P (l)) + 0(p n q n )) 



> 



C2,lP n qr, 



where C2,i > is a constant. Thus, the left-hand side in (3.28) tends to oo 
as n — > oo when Ep\(X,Y\Z) > 0, which implies that the probability that 
(3.28) holds tends to 1 if X and Y are not conditionally independent given 
Z. 

Test IN can also reject an alternative where Epj, n (Z) is small under 
the conditions in Theorem 3.1. Indeed, for {?7i,n}£!Li such that rji >n > for 
every n, lim n _ ! . 00 r/i in = and (3.30) holds, if 



,~ai\ max(7 ? i ;n ,(lnn) 5 / 32 /(n zv / ^/4)) _ 

( } ^Jz) () ' 

then the probability that (3.28) holds tends to 1 since 
nKc K Y2ti fz{z k )p 2 {z k ) - n Z fj, Pn>qn 



> ( V™z ( nh d n c K ( c 3 Epl n>qn (Z) 



° p ( (lnn ^4r ) + °p(m,n) ) + 0( Pn g n ] 

\n z ^/nh1 



x (c2,lPn<?n 



where p n q n /(nh d Ep^ qn (Z)) = 0{(\nn)^ /{n z nh d n Epl n ^(Z))) = o(l) by 
(3.20) and (3.31), and p n q n j \^/n Z 7ih d Epp n(]n (Z)) =o(l). In summary, test 
IN can reject an alternative where Ep^ n qn {Z) tends to zero at a rate that 
is slower than max(?7i in , (lnn) 5//32 /(n^i/n/i^)), where rji tn is determined by 
(3.30). An example that satisfies (3.30) and the conditions in Corollary 1 
will be given in Section 4. In that example, r]i in = p^n^ d . 

4. An example. In this section, an example is given to illustrate the 
verification of the conditions in Corollary 1, assuming (R1)-(R3) and the 
condition that there exists a positive constant c\ \ such that 

fx\z{x\z) > ci,i and f Y \z(y\z) > ci,i 

(4.1) 

for all (x, y, z) G X x y x Z, 

where fx\z('\ z ) an d fy\z('\ z ) are conditional probability densities of X and 
Y, respectively, given Z = z, with respect to Lebesgue measures. 
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Example 1. Suppose that X, Y and Z are random vectors that take 
values in [Q,l] dx , [0,1]^ and [0,l] d , respectively. Suppose that (R1)-(R3), 
and (4-1) hold. Choose the basis functions as follows. Let A be the set of 
all positive integers and A(k) = {m k : m G A} for k G A. For k, i\, . . . G A 
and /io > 0, let 

k 

hk,h ,h,-,i k ( x iT--' x k) = Yl I A ijth0 (x j ) for (xi,...,x k ) G [0, l] fc , 



where 

A 



(h (ij -l),h ij], ifij>l; 
ti>ho \Mij-l),hoij], */v = l. 



For p, q, r G A, let 

{<t> P ,i :l<i<p} = {h dxiP -x, dXih ^ idx : 1 < h, . . . ,i dx < p 1/dx }, 

{%j :l<j<q} = {h dy ^i /d y M _ ldy : 1 < h, . • . M y < q 1/dy } 

and 

{0 r>k :l<k<r} = \h, lr : , (i :l<i u ...,i A < r 1 ^}. 

Take ko to be the product kernel function such that 

k (zi, ...,z d ) = fcooOi) • • • k o(z d ), 

where /coo is the probability density function for the standard normal distri- 
bution. Let h n = n~ a , where l/{d-\-A) <a< 1/d. Let n* z to be the largest 
number in A(d) such that n* z < (lnn) 1//32 , and let 

{*:1S*S».>-{(^,..,^):1^,...<,<(»1)*}, 

so nz = {(n* z ) 1/d - l) d - Suppose that {p n } is a sequence in A(d x ) n A{d y ) 
such that linin^oo p n = oo and q n = p n . If 

(4.2) pi 2 < n z , 



then all the conditions in Corollary 1 hold. If 

l/d 



(4-3) p% < nf 



then (3.30) holds with rji tll = p^n z ^ d . 

Proof. We will first show that all the conditions in Corollary 1 hold 
assuming (4.2). It is clear that (3.1), (3.2) and (3.3), and conditions (Bl), 
(Kl) and (K2) hold. 
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To find the 5 n in condition (B2), note that for z 6 2, the smallest eigen- 
value of Vi t i(z) is the minimum of {E((f)p n j(X)\Z = z) : 1 < i < p n }, which 

is the minimum of {E(h -i/d x . (X)\Z = z) : 1 < i\, . . . ,i d < pl! dx }. 

Under (4.1), for m G A and 1 < i±, . . . , < m, 

E(h dxil/mih ^, :idx (X)\Z = z) 



= ■■■ / fx\z(x 1 ,...,x dx \z)dx dx ---dxi> 

J(h-l)/m J(i dx -i)/ m ™ dx 

Take m = p\ x , and we have that the smallest eigenvalue of V\\ (z) is at least 
c\ t i/p n . Similarly, c\^\/p n is also a lower bound for the smallest eigenvalue 
of V2^(z) and (B2) holds with 5 n = c\\/p n . Furthermore, (3.20) holds since 



nz(p n + q n ) 2 max{l, 5 n 1 (p n + q n )} = 0(nzPn) = 0( 



n, 



Finally, the z k s are in Z[e n ) with e n = (n* z ) 1//<J and h n /e n = 0(n ^) for 
< p < a. For 1 < k,k* <n z , and k ^ k* , \\z k - z k *\\ > (n* z y 1/d > n~ a , so 
(3.18) holds. Also, (3.24) holds since 



PnQn 



n z (max(p n ,g n ))V3 




= o(l). 

Therefore, all the conditions in Corollary 1 hold for this example. 

The verification of (3.30) is based on the fact that there exist positive 
constants c^i and ijo such that 

( 44 ) \pl n , q S Z )- pln^ Z ')\^ C ^PnW Z - Z 'W if Pnll*-^ll <m- 

Below we will first check (3.30) assuming that (4.4) holds and then prove 
(4.4). Suppose that (4.3) holds. Let g n (z) = fz(z)pp n q n ( z )- Since fz is Lip- 
schitz continuous, (4.4) implies that there exists a constant > such 
that 

\g n {z) - g n {z')\ <c Afi p^\\z- z'\\ if pi II z-z' \\ <rj . 
Let {z\ +nz , . . . , z n * } be the set 



K) 1/d '""' K) 1/d 



1 < h, ■ ■ ■ M < (n z ) 1/d ^ D{z k :l<k< n z } c , 



then 



n z f I \ d f ( 1 \ 
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if p^(n^) l l d < 770 - Since |(?n(z)| < cq by (R3) and there exists a positive 
constant C4 3 depending on d such that 



n z - n z 



we have 



n 



<c^{n* z ) l / d , ifd>2; 
= 1, ifd=l, 



I Z fz( Z )p 2 pn , q n( Z ) dZ 



k=l 



n 



z 

nz \ n 



f z ldz 

tE*W- / 9n{z)dz\ 
l z k=l Jz J 



nz 



n ■ 



+ 1^-1 

nz 



< 



nz 



n* 7 

1 - , f 

— /\,9n{zk) ~ / 9n(z)dz 

n z k =i Jz 



9n(z)dz 
+ c ( 1 + 



Idz 



n z - n z 
nz 



< 



C4,4P" 
lid 



for some constant 0^4 > if p^(n* z ) l / d < rjQ. Since < n z , p\n z ^ d 
o(l), so 



»z 



n 



Izfz( Z )p 2 p n ,qn( Z ) dZ 



k=l 



f z ldz 



Op 



i/d 



and p^n z l ^ d = o(l). Take r\\, n =p^n z 
(3.30) holds. 

It remains to prove (4.4). Recall that for z £ Z, Pp n) q n {z) is the largest 
eigenvalue of g(V(z),a n ^), as mentioned in Section 3.1. Thus, \p% n q n (z) — 
Pp n)<z „(z')| is bounded by \\g(V(z),a n ,*) - g(V(z'),a n ^)\\. For l<i,j < 2, let 
g*j be as defined in (7.8) and let Ay = gt d (V(z')) - g*j(V(z)) for 1 < i,j < 
2, then from the fact that ||AB|| < for two matrices A and B, we 

have 



and C3 = (f z Idz) — 1, then 



\\g(V(z),a n *) - g(V(z'),a r , 



2 2 



2 2 



(4.5) < J] IKUjiVizM + \\A id \\) -Ull MjiViz) 

i=lj=l i=lj=l 

+ \\gi,i(ytf))-g 1)1 (y{z" 
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The bounds for the ||<7*j(V(2))||'s are derived as follows. Since the elements 
in V{z) are bounded by 1 and the smallest eigenvalue of 9i,i(y(z)) is at least 
C i,i/Pn for 1 < i < 2, we have 

m^(\\gt fi (V(z))\\M,i(V(z))\\)<Pn, 

Ilfl*,i0^))ll 2 < 7 P J u =4- 

{ci,i/p n r c i,i 

and 

Ib2, 2 (^))ll<— • 

Cl,l 

To find bounds for ||fl'i,i(V(5/)) — gi,i(V(z)) || and ||Ajj||'s, note that from 
(R3), each element in gi,j(V(z')) — gi,j(V(z)) is bounded by yd J h(x, y) dfj,(x, y)\\z- 



z'\\, so 



max(||A li2 || ) ||A 2!l ||,|| 5l , 1 (y( 2 , ))-3i,i(^(^ 
<p n Vd / h(x,y)dfj,(x,y)\\z - z'\\. 



For 1 < i < 2, by Fact 4, 

||A . . \\9iW(zM 2 \\9i,i{V{z'))-9i,i{V{z) 



l-\\g!4V(z))\\\\9i,i(V^'))-9iAV(z) 

*\\!kmm\\9i,m*))-9v0r(*))\\<i,*> 

||A M || < 2 ^ Pn I h(x,y)d(j,(x,y)\\z - z'\\, 



if 



(4.6) ^ I h(x,y)dn(x, 



^Pn f ./ \ 7 / \ 1 1 ,„ 1 

M] z - z \\ < -. 
" 2 



To give a bound for ||a nj *||, note that the smallest eigenvalue of gi > i(V(z)) 
is at least c\^/p n and at most 

an,*9i,l{y{z))a n> * 1 



SO 



, Pn 
\0"n,* II — \ I 

C l,l 
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Prom (4.5) and the above bounds for ||a 
we have 



the WgtjiVizWs and ||A M -||'s, 



\\g(V(z),a n ^) - g(V(z'),a nj *)\\ < c^p^\\z - z' 



for some constant c±\ if (4.6) holds. Therefore, (4.4) holds and the proof for 
the results in Example 1 is complete. □ 

5. Simulation studies. In this section, results of several simulation ex- 
periments are presented. Those experiments are designed to demonstrate 
the performance of test 1 introduced in Section 3.2. 

In Section 3.2, test IN is also introduced, but no simulation studies are 
done for it in this section. The reason is as follows. Test IN is constructed 
based on the normal approximation for X^fe=i ^fc- Using the parameter set- 
up in Table 2, the selected nz is only 4 or 5 and the normal approximation 
for Y^k=i is not expected to work well. 

For simplicity, in all the simulation experiments here, X , Y , Z are one di- 
mensional and only the following distributions for (X,Y, Z) are considered. 

(Ml) (X, Y) = ($(Zei),$(Ze 2 )), where e±, e 2 and Z are independent, Z 
follows the uniform distribution on [0, 1], and follows the standard 
normal distribution for i = 1, 2. 

(M2) Z follows the standard normal distribution, and the conditional dis- 
tribution of (X, Y) given Z = z is bivariate normal with mean p, and 
covariance matrix E, where 



and the p(z) in (5.1) is taken to be a(\l - 2$(z)|) with a G {0,0.1,0.3}. 
(M3) (X, Y, Z) = ($(X ), $(lo), $(Z Q )), where Z follows the ^distribution 
with degree of freedom 1, and the conditional distribution of (Xq,Yq) 
given §{Zq) = z is bivariate normal with mean \i and covariance ma- 
trix S, where /i and S are as in (5.1) and the p(z) in (5.1) is taken to 
be a(|l — 2^|) with a G {0, 0.1, 0.3}. 

Here, (Ml) is used for parameter selection and (M2) and (M3) are used for 
checking the power of test 1. In (Ml), X and Y are conditionally indepen- 
dent given Z. In (M2) and (M3), pi(X,Y\Z = z) = p{z) and E Pl (X,Y\Z) is 
proportional to a. 

The details of parameter selection are given in Section 5.1 and the exper- 
imental results are given in Section 5.2. 



(5.1) 
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5.1. Parameter selection. To apply test 1, certain parameters need to be 
chosen, including the kernel function ko, the kernel bandwidth h n , the basis 
functions (j)p n /s and VVj' s and the evaluation points z^s, which are chosen 
as follows. 

(SI) feo and the basis functions (f> p /s and tpqj's are chosen as in Exam- 
ple 1 in Section 4 with p n = q n = 2. Since the basis functions are 
supported on [0,1], if X, Y and Z do not take values in [0,1] [such 
as in (M2)], then the data {(Xi, Yi, Zi)}™ =1 will be transformed to 
{($(X i ),$(F i ),$(Z i ))}f =1 before applying test 1. The bandwidth h n 
is chosen to be the h that minimizes 

/.1-0.143?! - 121 

(5.2) / E{f z (z)-lfdz 

J0.U3h 0121 

over (0, 0.5], where fz is the kernel density estimator based on a sample 
of size n from the uniform distribution on [0, 1] with kernel ko and 
bandwidth h. Below are the h n : s used for different n's. 

The z fc 's are points in I n = [0.143/i°- 121 , 1 - 0.143/i°- 121 ] such that 
Zk = 0.143/i°' 121 + (k — l)/io,n, where ho >n is a given positive number. 
Here, the e n is taken to be 0.143/i^' 121 , so the z^'s are chosen so that 
they are 0.143/i°' 121 away from the boundary and the integral in (5.2) 
is over [0.143/i a121 , 1 - 0.143/i ' 121 ]. 

With the parameter set-up in (SI), it remains to choose h§ n . The Hq n is 
chosen to be the smallest multiple of 0.01 such that the distribution for 
the test 1 statistic nh^CKYjk=i fkP 2 { z k) based on 1000 samples of size n 
from (Ml) is similar to the distribution of X^fc=i (x 2 with nz degrees of 
freedom), as stated in Theorem 3.2. The one-sample Kolmogorov-Smirnov 
test is used to determine whether the two distributions are similar. Below 
are the ho^'s used for n = 10,000 and n = 5000. 

For the above procedure for selecting ho t n, when n = 500 or n = 1000, 
it seems that the distribution of nh^cxYlk^i fkP 2 ( z k) cannot be approx- 
imated well by the distribution of X^fi^fc; regardless what ho >n is used. 
To overcome this problem, one may use local bootstrap to determine the 
rejection region. 

The idea of using local bootstrap is to draw samples {(X*, Y*, Z*)}™ =1 
from the distribution of (X* ,Y* , Z*), where Z*'s distribution is close to 
the distribution of Z and the conditional distributions of X* given Z* = z 
and Y* given Z* = z are close to the conditional distributions of X given 
Z = z and Y given Z = z, yet X* and Y* are conditionally independent 
given Z* . Therefore, if X and Y are conditionally independent given Z, 
then the local bootstrap resamples {(X* ,Y* , Z*)}f =1 should behave like a 
random sample from (X, Y, Z) . One can then compute the test 1 statistic 
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Table 1 
Selected h n 's for different n 's 



n 10,000 


5000 


1000 


500 


h n 0.05935281 


0.06525282 


0.08533451 


0.0983018 



nh^CK Ylk=i fkP 2 ( z k) f° r the original sample and for each local bootstrap re- 
sample. If the statistic computed based on the original sample is larger than 
(1 — a)% of the statistics computed based on the local bootstrap resamples, 
then the conditional independence hypothesis is rejected at level a. 

The local bootstrap procedure used here is the same as the one pro- 
posed by Paparoditis and Politis [8] except that here the Zj's are not lagged 
variables. For a given sample {(Xi,Yi,Zi)}f =1 , a local bootstrap resample 
{(X* ,Y* , Z*)}f =1 is generated as follows. 

• Step 1. Draw a random sample (Zjf , . . . , Z*) from the empirical cumulative 
distribution function Fz, where 



1 " 

W = -E J (-oo^lW- 



n 
i=l 



Step 2. For 1 < i < n, for each Z* from Step 1, draw X* and Y* indepen- 
dently from the empirical conditional cumulative distribution functions 
F x \z=z* an d Fy\z=z*i respectively, where 

p ( . numzt - Zi)/b)i^ Xi] { X ) 

and 

*y\ z = z tw u=iH(z*-z t )/b) 

The parameters for test 1 with local bootstrap are chosen as follows. The 
bandwidth b is taken to be /i°' 4 , p n = q n = 2 and ho <n = 0.4, where h n is 
as in Table 1. 





Table 2 






ho,n's for different n 's 




n 


10,000 


5000 




0.16 


0.2 
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5.2. Experiments. The objective of the first experiment is to compare 
the power of test 1 with that of a Hellinger distance-based test proposed 
by Su and White [13]. The critical value for Su and White's test can be 
determined using the asymptotic distribution of the test statistic or using 
local bootstrap. To distinguish between the two cases, we use test 2A to 
denote the asymptotic distribution-based version of Su and White's test and 
test 2B to denote the local bootstrap version. While test 2B is recommended 
by Su and White [13] , test 2A is used here to save time for computation. 

In this experiment, both tests 1 and 2A are carried out for 1000 random 
samples of size n = 10 4 , where the distribution of (X,Y,Z) is as in (M2) 
or (M3). Under (M2), test 1 is applied to transformed data, as mentioned 
in Section 5.1. Test 2 A is applied to normalized data and the bandwidth 
parameter in the kernel estimators in the test statistic is taken to be n -1 / 8 ' 5 , 
as in [13]. The power estimates based on data from (M2) and (M3) with 
n = 10 4 are given in Table 3. The asymptotic significance level is 0.05. It is 
shown in Table 3 that power estimates for test 1 when a = and a = 0.1 are 
larger that those for test 2A. 

To explore the power performance of test 2B without actually running 
the local bootstrap procedure, approximate critical values for test 2B un- 
der (M2) and (M3) are used. To obtain these approximate critical values, 
note that under (M2) or (M3), for large n, a local bootstrap resample for 
a = 0.1 or a = 0.3 is approximately distributed as a random sample for the 
a = case, so the critical value for test 2B can be approximated by the 95% 
sample quantile of the 1000 test 2A statistics from the first experiment for 
the a = case. Then the power estimates for test 2B can be approximated 
by the proportions of the 1000 test 2A statistics from the first experiment 
under different alternatives that exceed the approximate critical values. The 
approximate power estimates are given in Table 4. Note that the approxi- 
mate power estimates for test 2B are often larger than the power estimates 
for test 2 A in Table 3, which suggests that test 2B is more powerful than 
test 2A. 

To investigate the performance of test 1 when the sample size is smaller, 
in the next experiment, power estimates for test 1 are computed based on 
1000 random samples of size n = 5000 from (M2) and (M3). The results are 



Table 3 

Power comparison between tests 1 and 2A 







a = 


a 


= 0.1 


a 


= 0.3 




Test 1 


Test 2A 


Test 1 


Test 2A 


Test 1 


Test 2A 


(M2) 


0.049 


0.028 


0.65 


0.076 


1 


0.95 


(M3) 


0.041 


0.029 


0.572 


0.119 


1 


1 
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Table 4 

Approximated power estimates for test 2B 





a = 0.1 


a = 0.3 


(M2) 


0.128 


0.971 


(M3) 


0.241 


1 



given in Table 5. The results for n = 10 4 from the first experiment are also 
included for comparison. The asymptotic significance level is 0.05 as before. 
Table 5 shows that test 1 is more powerful when n is larger. 

Finally, for smaller sample size such as n = 500 or n = 1000, since the ap- 
proximation in Theorem 3.2 does not work well, the local bootstrap version 
of test 1 is considered. Here 1000 samples of size n from (M2) are used, and 
for each sample, 1000 local bootstrap resamples are used to determine the 
rejection region. The level is 0.05. The power estimates for the test are given 
in Table 6. 

In the above results, the power estimates for test 1 are larger when a 
is larger. This is expected. Under (M2) or (M3), Ep 2 pnqn (Z) = Ep\ 2 {Z) in- 
creases as a increases (a € [0, 1]), so test 1 should be more powerful for larger 
a, if the approximation in (3.22) and (3.30) work. Table 7 gives the values 
of Ep 2 pnAn (Z) for a = 0.1 and 0.3. For (M2), the calculation of Ep 2 pnqn {Z) 
is done for the transformed (X, Y, Z) , which is obtained by applying the 
function <I> to the original (X,Y,Z). 

6. Concluding remarks. A test statistic for testing conditional indepen- 
dence based on maximal nonlinear conditional correlation is proposed. Two 
tests, tests 1 and IN, are constructed using the test statistic. Both tests are 
consistent and have similar asymptotic properties, as discussed in Section 
3.2. Some simulation experiments are carried out to check the performance 
of test 1. The simulation results show that when the sample size n = 10 , 
the power of test 1 is comparable with that of test 2A. The simulation re- 
sults also indicate that test 1 has better power when Ep 2 nqn (Z) is larger, 
as expected. 

Below are a few remarks. 



Table 5 

Test 1 power estimates for n = 5000 and n = 10 4 







a = 


a 


= 0.1 


a 


= 0.3 




(M2) 


(M3) 


(M2) 


(M3) 


(M2) 


(M3) 


n = 5000 


0.052 


0.039 


0.373 


0.321 


0.998 


1 


n = 10 4 


0.049 


0.041 


0.65 


0.572 


1 


1 
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Table 6 

Power estimates for test 1 with local bootstrap 





a = 


a = 0.1 


a = 0.3 


n = 500 


0.041 


0.071 


0.309 


n = 1000 


0.033 


0.099 


0.531 



1. Equation (3.20) requires that p n , q n and nz grow slowly comparing to n. 
The parameter selection result in Table 2 in Section 5 seems to agree with 
such a requirement. With n = 10 4 , nz is only 5 and p n = q n = 2. When 
Pn = Qn = 3, even with ho >n = 0.4 (this corresponds to the smallest nz for 
n= 10 4 ), the distribution of the test statistic cannot be approximated 
well by the distribution of J2k=i ^fc- 

2. The parameter selection criteria given in Section 5 needs to be studied 
to see whether the asymptotic properties of test 1 still hold using such a 
criteria. 

3. When the distribution of the test statistic cannot be approximated well by 
the distribution of J2k=i ^ fc > ^ * s P oss ible to use local bootstrap version of 
test 1. However, it takes a lot of time to obtain the bootstrap resamples, 
so this approach is recommended when the sample size n is small. 

4. In all theorems proved in this paper, it is assumed that the (Xi,Yi, Zi)'s 
are i.i.d. It is also expected that test 1 works for some stationary weakly 
dependent data such as the vector ARMA processes, where the central 
limit theorem for the i.i.d. case still applies. However, to carry out the 
details in the proofs, one needs the strong approximation result in Lemma 
2, which is a stronger result than the central limit theorem and requires 
a version of Lemma 5 that works for dependent data. 

5. Test 1 can be modified to work for discrete Z. Modification is necessary 
since the rate of convergence for each p{zk) is faster in the discrete case. 

6. In Lemma 1 and Theorems 3.1 and 3.2, the z^s are chosen in Z{e n ) so 
that they are e n -away from the boundary, and it is assumed that h n /e n = 
0(n~P) to ensure that certain error terms in the bias/variance calcula- 
tion are negligible. For implementation, the condition h n /e n = 0(n~@) 
still leaves some room for choosing e n . This problem can be eliminated 





Table 7 






Ep 2 Pniqn (Z) under (M2) and (M3) 






a = 0.1 


a = 0.3 


(M2) 


0.001345575 


0.01908246 


(M3) 


0.002044604 


0.01765322 
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by using a kernel function with compact support, as pointed out by a 
reviewer. In particular, if the kernel function ko is supported on [—1, l] d , 
then one can simply take e n = h n . In such case, even though the condition 
hn/tn = 0(n~P) does not hold, the results in Lemma 1 and Theorems 3.1 
and 3.2 remain valid. 

7. Proofs. 

7.1. Proof of Lemma 1. Recall that for 1 < j < k n , 



To prove the asymptotic normality of W n j(zfc)'s, we will approximate W n j(z) 
using sums of i.i.d. random variables. For 1 < i < n, let WQi(z) = ko(h~ 1 (z — 

Z,))and let f z (z) = n^ x h~ d YTi=\ w ^A z )- Tnen w i( z ) = n~ 1 h~ d WQ^(z)/ f z {z). 
For 1 < j < k n , let 

n 

W nJ (z) = (nhZfzWr^CK^^wo^fnjiX^Yuz) 



i=l 



Ew ,i( z )fn,j(Xi,Yi,z)) 



and W n>kn+1 (z) = Jrt4^{fz{z))- l l 2 {f z {z)-Ef z {z)), then 

Wnj(z) = %&W nJ (z) + Jnh d c K f z (z)E(f n ,(X,Y,z)\Z = z)(^\ - l) 

\Jnh d c K f z (z) d 

+ YT\ E ( W 0,l[ Z )fn,j(XuY 1 ,z)) 



-E(f nd (X,Y,z)\Z = z)f z (z)) 

W nJ (z)+Y,Re 



4 



where W nJ (z) = W nJ (z) - W n , kn+1 (z)E(f nJ (X,Y, z)\Z 
R 1 , n j(z)=(^-l)w n , j (z), 

\fz{z) J 



z 



D / x \/ n K c Kfz{z) d 

fz(z) 



R3,n,j (z) 



-E(f n>j (X,Y,z)\Z = z)f z (z)), 

yftJ%^E(f nij (X,Y,z)\Z = z){f z {z) - f z {z)) 2 
fz(z)^Mz) 
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and 

Rt, nJ (z) = - ^^ E(f nd (X,Y,z)\Z = z){Ef z {z) - f z (z)). 

We will complete the proof by showing that the following results hold for 
T n = exp(— (Inn) 1 / 9 ). 

(ci) Efci £^i(£ii W*)) J = °p( T n)- 

(C2) There exist random variables Nij >k and £ij )k : 1 < j < fc n , 1 < fe < 

such that the joint distribution of {Nij >k + £±j t k)j,k is the same as 
that of (W n j(zk))j t k, Nij^'s are jointly normal with EN±j )k = and 
CoY(N 1Jtk ,N 1Ak *) = CoY(W n>j (z k ),W n>e (z k *)) and E -=i E^ij fe = 
O p (T n ). 

(C3) There exist random variables N2j k and £2? ft : 1 < J < &n) 1 < & < n z 
such that the joint distribution of (iV2,i fe + £2 j,k)j k is the same as that 
of (Ni ; j ik )j >k , N2,j, k s are jointly normal with EN2j tk = and 

Cav(N 2jj>k ,N 2Ak *) 

Cov(/ nJ (X, y, z k )J n/ (X, Y, z k )\Z = z k ), if k = k*; 
0, otherwise, 

and EfciT£i4j,k = <> p {T n ). 

Note that Lemma 1 follows from (C1)-(C3) since one can construct ran- 
dom variables N2j tk , £~2,j,k, £i,j,k and Rh, n ,j,k : 1 < j < k n , 1 < k < nz on the 
same probability space such that the joint distribution of (N2,j,ki^2,j,k)j,k is 
the same as that of (N2j t k,£2,j,k)j,k, the joint distribution of (Sij >k ,N2,j,k + 
£2,j,k)j,k is the same as that of (£ij >k , -/Vij,fc)j,fc> and the joint distribution 
of (R5, n ,j,k,N2,j,k + £2,j,k + si,j,k)j,k is the same as that of (%2t=i Ri,n,j(z k ), 
W n j{z k ))j,k- Take W n ,ij,fc = ^2,i,fc and W n , 2 j,fc = £2,j,k+£i,j,k + #5,n,j,fc> then 
we have Lemma 1. 

To establish (C1)-(C3), we need certain expectations and covariances, 
which are computed below. Under (R1)-(R3) and the conditions that 
J uko(u) du = and erg = J ||n|| 2 /co(u) < 00, for z G Z(e n ), we have 

(7.1) 

= y, z)|Z = z)/z(z) + r nJtl (z)C n h*, 

where 



r n ,j,i(^) = co y h(x,y)dfj,(x,y) 

x (2do-Xj,i + 6n,j,2K 2 {2 + /i n )7^exp(-7 5 e2/ l -2)) ) 
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|#nj,i|> |0nj,2| < 1; and 74 and 75 are positive constants that depend on 72 
and 73 only. Also, for k^k* , z k , z* k £ Z(e n ), we have 

= ^,fc^( /i n)" 2 (72) M exp(-0.573 ^ 2 |kfe - z k *f)Cl 

- fz (z k )fz (z k * )E(f n j (X, Y,z k )\Z = z k )E(f n/ (X, Y, z k * )\Z = z k *) 

(7-2) 

- fz{z k )E(f n j(X,Y,z k )\Z = z k )r nA1 (z k *)C n h n 

- fz(z k *)E(f n: e(X,Y,z k *)\Z = z k *)r n j A (z k )C n hl 

where \9j,£ jk , k *\ < 1. Finally, for z £ Z(e n ), 
(h^CoYiwo^fnjiXx^z^wo^fn^XuY^z)) 

= f z (z)E(f nJ (X, Y, z)f n/ (X, Y,z)\Z = z)J k 2 {u) du + r njA2 {z)C 2 n h n 

- h d n f 2 z (z)E(f nJ (X,Y,z)\Z = z)E(f n/ (X,Y,z)\Z = z) 

(7.3) 

- h d+2 C n r n ^ 1 (z)f z (z)E(f n/ (X,Y,z)\Z = z) 

- h d n +2 C n r nA1 (z)f z (z)E(f nd (X,Y, z)\Z = z) 
-h d n +i C 2 n r n , hl {z)r nA1 {z) 

and 

(7.4) h^Eiwo^fn^X^z)) 3 < Clc j k 3 (u)du, 

where 

\r n ,jM z )\< 2c o J h(x,y)d(i(x,y)(Vd J \\u\\k 2 (u) du + h^e^^ 

for some positive constants 76 and 77 that depend on 72 and 73 only. Below 
we will prove (C1)-(C3). 

Proof of (CI). Let S n = E k Li(fz(z k ) - fz(z k )) 2 a ndA n = {^< 
min{l, (2d)- 1 }}. From (7.1) and (7.3), ES n = 0(n z {hi + (n/^)" 1 )) = 
0{nz{n x h^)' 1 ) and 1/ fz(z k ) < <7 for all k, P(A^) — > as n — > 00 . From 
(7.1), on A n , 

k„ n z / 4 \ 2 

^=1 k =i \i=i j 
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/ / k n n z \ 

< O(l) Sn E E W l 3 {z k ) + k n n z C 2 n (nh d n +A ) + k n C 2 n nh d n S\ 



2 

n I ' 



Kj=l k=l 



and it follows from (7.3) that 



(k n nz \ 
EE^(^) = 0(k n n z Cl). 
j=l k=l / 



Take 



k r, 2 r 2 

J i,n — T2 r K n nz^ n nh n , 

then (CI) holds with T n = exp(— (Inn) 1 / 9 ) since Ti >n = 0(T n ). □ 

The proof of (C2) is based on the following lemma, which deals with the 
normal approximation of sum of i.i.d. random vectors. 

Lemma 2. Suppose that X±, . . . ,X n are i.i.d. random vectors in R dl 
with mean and variance S. Suppose that there exist positive constants 
C, d2 and a% such that 1 < 02 < 03 < C, \\X\\\ < C and -E||A"i|| fc < a\ for 
k = 2, 3. Then for T > 1, there exist random vectors S and Y on the same 
probability space such that S is distributed as (Xi + ••• + X n )/^/n, Y is 
multivariate normal with mean and variance £ and for n> (25/ (16a 2 .) + 
25di/12)C7 2 r 4 exp(3r 2 /16), 

P(\\S-Y\\>a)<a, 

*/ 

a ^ 33_ 1 75a| dle(dl+3)T 2 /8 + / 48 ^d le -3T 2 /(32al) 



The proof of Lemma 2 is given in Section 7.1.1. To prove (C2), note that 
W n j(z k ) = YJi=ii.9n,j,k{Xi,Yi, Zi) - Eg nJ:k (Xi,Yi,Zi))/y/ri, where 

9n,j,k \Xi ; Y{ , Zi ) 

\/CK , ( Z k - Zi 

--k 



Vfz(z k )h d V K 
x (f nJ {X t ,Y u z k )- E(f n>j (X, Y,z k )\Z = z k )). 
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< Q{l)C n y/KM 



From (7.1)-(7.4), we have 

/ fcn n z \ V2 

I X] ^(sW,fc(^;> z i) - Eg n j tk (Xi,Yi, Zi)f\ 
\j=i fe=i / 

Vj=i fe=i / 

and 

/ / kn n z \ 3 / 2 \ V 3 

<c n ^/k~^ z K d i Q o{i). 

Note that for every constant M > 0, the condition 



n > 



/ 25 25fc n nA / MC n y/k n n z \ 2 4 3T | /i6 

U 12 A / 3>ne 



holds for large n with T3 jn = (lnn) 1//8 , so Lemma 2 is applicable. From Lem- 
ma 2, (C2) holds with any T n such that T^^n = 0(T n ), where 

7 > is a constant. Since T2 >n = 0(exp(— 7i(lnn) 1//8 )) for some constant 
7i > 0, (C2) holds with T n = exp(-(lnn) 1 / 9 ). 

The proof of (C3) is based on the following result. 

Fact 3. Suppose that A and B are d\ x d\ nonnegative definite matrices. 
Then 

\\VA- y/B\\ < df^WA- B\\. 

The proof of Fact 3 is given at the end of the proof of (C3). Note that 
Fact 3 implies the following: suppose that Xq and Yq are two d\ x 1 normal 
vectors of mean and covariance matrices A and B, respectively. Let Z 
be a d% x 1 normal vector whose elements are i.i.d. N(Q, 1). Then \f~AZ is 
distributed as Xq and \f~BZ is distributed as Yq and 

\\VJZ - s/BZf < \\VA-Vb\\ 2 \\Z\\ 2 < dl /2 \\A - B\\\\Z\\ 2 

= O p (dl /2 \\A-B\\). 
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Therefore, (C3) holds if Cov(W n j(z k ), W n ^(z k *)) is close to 
Cov(f n j(X,Y, z k ), f n/ (X,Y, z k )\Z = z k )5 k:k *, 
where S kk * is 1 if k = k* and is otherwise. From (7.1)— (7.4), we have 

(Cov(W nJ (z k ),W n , e (z k *)) 

j,£,k,k* 

- Cov ( f n>j (X, Y, Z k ), f n /(X,Y, Z k )\Z = Z k )S k ,k*) 

= h n C 2 n {k n n z ) 2 0(l), 



so (C3) holds with T n = exp(-(lnn) 1 / 9 ) since {k n n z f /2 ^h n C%(k n n z ) 2 = 
0(exp(-(lnn) 1 / 9 )). 

Proof of Fact 3. Consider first the case where A is diagonal. Let D 
be a diagonal matrix such that B = Q T DQ for some Q such that QQ T = I. 
Let D = diag(Ai, ...,X dl ), A = diag(«i, ...,a dl ),Q = (qij) and E = B-A = 
(ejj ). Let qi be the ith column of Q, then qfDqj = ctiSi j + ej j , where <5jj = 1 
for i = j and Sij = 0, otherwise. Write Dq k = ^2j=i(q k Dqj)qj , then 

di 

\\VDq k - y/ak~q k \\ 2 = ^2{\/%qj,k ~ 
3=1 



: ,fc) 2 



3=1 
di 



3=1 



1/2 / dl v 1/2 
2 J 



and 



/ di \ V 2 / di 

- |£( A i?7,* ~ a ^,fc) 2 J ^ 9 2 

fix \ V 2 

t=i j=i 

i=l j=l 
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3=1 \e=i / 



di / di \ V2 



<w 3/2 EE* 

Vi=l £=1 / 



so the result in Fact 3 holds if A (or B) is diagonal. For general A and 
.B, write A = P t AqP and -B = Q T DQ, where ^4q an d D are diagonal and 



The proofs of Fact 3 and Lemma 1 are complete. □ 

7.1.1. Proof of Lemma 2. The proof Lemma 2 is based on several facts, 
which are taken directly or adapted from some existing results and are 
stated/proved below in Lemmas 3-5. 

In the statements of Lemmas 3 and 4, (So, do) is a metric space, B denotes 
the collection of Borel sets in (So, do), and for two measures p\ and \xi defined 
on B, po(/-ii) A*2) denotes the Prohorov distance of fi\ and which is defined 
as 

Po(a*i,M2) = inf{e >0:/ii(A) < [i 2 (A e ) + e, for all A G £>}, 

where A e = {x : d*(x, A) < e} and d* (x , A) = ini{do(x , y) : y £ A} . Here are 
Lemmas 3-5. 

Lemma 3 (Lemma 2.1 in Berkes and Philipp [1]). Suppose that P\ and 
P2 are two measures defined on B and p$(P\,P2) < a - Then there exists a 
probability measure Q on the Borel sets of So x So with marginals P\ and 
P2 such that 



Lemma 4 (Adapted from Lemma 2.2 in [1]). Suppose that F and G are 
two distributions on R dl with characteristic functions f and g, respectively. 
Then for a G (0, 1] and T > 0, the Prohorov distance po(F, G) < a, where 






Q{(x,y):do{x,y) > a} < a. 





36 T.-M. HUANG 

Proof. Let H be the N(0,a 2 I) distribution on R dl , where I is the 
identity matrix and a > 0. Let F\ be the convolution of F and H and G\ 
be the convolution of G and H. Then 

(7.5) po(F,G) < p (F 1 ,Gi) + 2max{r,H({x: \\x\\ > r})} for every r > 0. 

Let fi , g± and h be the characteristic functions of F\ , G\ and H, respectively, 
and let jf and 7g be the densities of F\ and G\ , respectively. Then 

\7f(x)-7g(x)\ = (^T dl J e- luTx {h{u)- 9l {u))du 
<(2vr)- dl f \f{u)-g(u)\\h(u)\du, 



which implies that for every borel set B in R dl , 
F X {B)-G X (B) 

<Fi(Sn{z:||z|| <r})-Gi(J3n{z:||z|| <r})+Fi({z:||z|| > T}) 
< I \j F (x)-~/ G (x)\dx + F({x:\\x\\ >T/2}) 

J{x: |H|<T} 

+ fl"({x:||x|| >T/2}) 
- (7) 1 / l/H-.9HIIM")l^ + J F 1 ({x:|| a :||>r/2}) + i/({ a ::|| a; ||>r/2}). 



7i 

Note that 7/ is an upper bound for the Prohorov distance po{F\, G\), so for 
r < T/2, it follows from (7.5) that 

Po(F, G) < II + 2r + 2H({x : \\x\\ > r}) 

<(?) / \f(u) - g(u)\\h(u)\du + F({x:\\x\\>T/2}) +2r 

+ 3P( X 2 (d 1 )>(r/a) 2 ). 
Since h{u) = e~ a2 W u W 2 1 2 and 

P( X 2 (d l )>A)<e- tA Ee t ^ d % =3/8 

(7.6) 

= e -3^/8(2 d i ) for every A > 0. 
Lemma 4 holds if r = crT/2 and a £ (0,1]. □ 

Lemma 5 (Adapted from Theorem 1(a) in pages 204-208 in Gnedenko 
and Kolmogorov [5]). Suppose that X±, . . . ,X n are i.i.d. random vectors 
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with mean and variance S. Suppose that C and a are positive constants 
such that ||Xl|| <C, a< C and E\\Xi\\ k < a k for k = 2, 3. Let f n be the 
characteristic function of {X\ H + X n )/\/n. Then 



f 1 \ .II3„3 

f n {u) - expl --u Ett I 



„ 0.25||u|| 3 a 3 



71 



if \\u\\ < (0.4v^)/C. 

Proof. Consider first the case where X\ is univariate. Let U = f\ (u/y/n) 
1, then 



2 / „. \ 2 



17- 

and 



EXf ( u 

n 



= Ejqfiu_\ 2 + e l E\X l \Z{ u ^ 3 



2 ^7 3! Vv 7 ^ 

where |0J| < 1 and |6>i| < 1. Suppose that \u\ < (0Ay/n)/C, then \U\ < 0.1 
and 

log(l + J7) = U + 0.629 2 U 2 , 

where \9 2 \ < 1. Let 7 = log / n (u) + E(Xf)v?/2 = £(X 2 )u 2 /2 + nlog(l + U), 
then 

Ai|u| 3 a 3 /A 2 a 4 u 4 A 3 a 5 |u| 5 A 4 a 6 u 6 
+ 0.62 + 3 ' ' + 



6-y/n \ 4ra 6(y / n) 3 36ra 2 

d 3 a 3 / Ai / Aoalnl A 3 a 2 u 2 \Aa 3 \u\ 3 
' -4- + 0.62 + + 



\ 6 \ 4y/n 6n 36(- v /n) 

where |A&| < 1 for k = 1, 2, 3, 4. Since a|u|/\/n < 0.4, 

6> 3 (0.25)|7x| 3 a 3 

where |0 3 | < 1. Since e v = 1 + 6 4 \V\e\ y \ , where |(9 4 | < 1, 
/„(«)=exp(-^^)(l + Wel v l) 

= exp(-^f^) +g5 (°- 25 H 3a3 VlVl-^ 1 >V2 ; 
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where |#s| < 1. To find an upper bound for \V\ - E(X 2 )u 2 /2, note that 



'0i|£|Xi| 3 kz| 3 < CEX 2 \u\ 3 < (0A)u 2 E(X 2 ) 



6y/n 6-y/n 6 

n\U\ = \9\\u 2 E{Xl)/2 < u 2 E{Xf)/2 and 

|n(log(l + U)-U)\= 0.62n\9 2 U 2 \ < 0.62(0.1) ( E ( X p u 



since \U\ < 0.1. Therefore, 



E ^ u2 +n U + n(log(l + U)-U) 



u 2 E(X 2 ) 



< (0.4)^ 2 £(X 2 ) 0.062£(X 2 )n 2 u 2 £(X 2 ) < 
6 + 2 2 ~ 

and Lemma 5 holds for the univariate case. The result for the general case 
can be obtained by applying the univariate result with u and X{ replaced 
by ||it|| and Yi = u T Xi/\\u\\. □ 

Now we are ready to prove Lemma 2. 

Proof of Lemma 2. Let f n be the characteristic function of (X± + 
• • • + X n )/y / n and g be the characteristic function of G, the iV(0, E) distri- 
bution. From Lemmas 3-5, there exist random vectors S and Y on the same 
probability space such that S is distributed as {X\ + • • • + X n )/y/n, Y is 
multivariate normal with mean and variance S and 

P(\\S-Y\\ >ai)<ai, 

where 

0.25a§ {2\ dl/2 T dl 



aT + 3(2^)e- 3r2 /32 + ^(2\ 1 Z-E{ X 2 { dl )fl 2 

+ 20) dl/2 ^p(x 2 (rfi)>^^)+^(l|iv(o,s)||>r/2). 

From the facts that E( X 2 {d 1 )) 3 / 2 < (£(x 2 (di)) 2 ) 3/4 and P(||iV(0, E)|| > T/2) < 
P(x 2 (di) > T 2 /(4a|)), (7.6) and the condition a 2 > 1, we have 

ai < aT + 4 (2^)e- 3T2 /( 32 ^) + f £\ * /2 J^L^ + d 2 ) 3 / 4 



L\ i_( 2 rfi) e -0.06n<7 2 /(C* 2 )_ 
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Set a = T-ie" 3 ^/ 32 , then < a < 1, T/a < 12e T2 / 8 and 1/a < 3e T2 / 8 , 
which, together with the fact that (2/7r) dl / 2 (2di + d 2 ) 3 / 4 < 5, gives that 

33.75a 3 , 



«!<(! + 4(2 dl )) e -3T 2 /(32a|) + ££^3 (12)dl e (d 1+ 3)Z*/B 



2 

+ 2(19.15) dl e <^ 2 /8 e -0.06na 2 /(C 2 ) 
< 33^7501^2^^(^+3)^/8 + (48)dle -3TV(32al) < ^ 



if 0.06ncr 2 /(C 2 ) > diT 2 /8 + 3T 2 /(32a|), which corresponds to n > (25/(16 x 
a|) + 25di/12)C 2 T 4 exp(3T 2 /16) and we have Lemma 2. □ 

7.2. Proof of Theorem 3.1. To prove Theorem 3.1, we apply Lemma 1 
by taking the f n ,j(X, Y, z)'s to be the functions 4>}{X)(j)* & {X), 4>}{X)i)* m (Y) 
and V'm(^)' ! / , m'(^)' wnere 1 < ^ < ^' < Pn and 1 < m < m' < q n . In such case, 
(3.19) holds under conditions (Bl) and (B2). To see this, for each 1 < k < nz 
and 1 < j < p n , let <p* n - k be the jth component of <jf when z = z k . Then 

Cj,fc( x ) = Th=i a n ,i,j,k<f>n,i(x) for some a niiij;k 's and 
l = Em^ k {X)f\Z = z k ) 




Pn 



SO 



'*n,j,k(x)\ < ^Eti<^Eti^) < « Similarly, for each 
1 < k < nz and 1 < j < q n , let ^ n j k be the jth component of ip* when 
z = z k , then \ip n j^ k (x)\ < \Jq n /5 n . Thus, (3.19) holds with C n = max{l, (p n + 
Qn)/^n} and it follows from Lemma 1 that Ylk=i \\V*( z k) ~ V*( z k)\\ 2 has the 
same distribution as ^2 k l = 1 {nh n CKfz(zk))~ l \\W n ^ j k + Wn^fcll 2 ) where the 
W n i it's and VK n ^,fc' s are random matrices such that each element in W n \ k 
is normal with mean zero and variance bounded by C 2 = (max{l,(p n + 
q n )/5 n }) 2 , and \\W nX k\? = Op(exp(-(lnn) 1 / 9 )). Therefore, 

(7.7) £ ||F*(z fc ) - y*(z fc )|| 2 = P ({nh d n r\\nn)^). 

k=l 

To control the difference between g(V*{zk), a*) and g(V*(zk),a*) for 1 < 
k < n z , for a (p n + g n ) x (p n + q n ) matrix U, let 

, 7a , * fm -l9ij(V), if (i,j) = (l,2)or (2,1); 

(7.8) 9i)j {U) -\ g -j {U) , if ( is j) = ( i,i) or (2,2). 
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For 1 < k < n z , let A iJjk = gUV*{z k )) - glj(V*(z k )) for 1 < i,j < 2. Then 
from the fact that ||AB|| < ||^4||||-B|| for two matrices A and B, we have 

\\g(V*(z k ),a*)-g(V*(z k ),a*)\\ 

2 2 2 2 

(7.9) < flfliMjiv^ZkM + ||Ay it ||) - nn MAV*(zk))\\ 

i=lj=l i=l j=l 

+ \\gi,i(V*(z k ))-g 1A (V*(z k ))\\\\a*(a*f\\. 
To control the Ai i ^ and A2,2,fc in (7-9), the following result is needed. 

Fact 4. Suppose that A is a p x p invertible matrix and A = A — I p . 
Then P" 1 -I p + A|| < {{A' 1 - I p \\ || A|| and 

Proof. Let B = A' 1 — I p . Then B = -A-BA, so + A|| = \\BA\\ < 
\\B\\\\A\\. Also, 

(7.10) ||B||<||A||(1 + ||B||). 
Apply (7.10) and we have 

11511 -T^lii if l|A|l<L 

Since ||a*|| = 1 and for 1 < k < n z , gi,\{V*{z k )) = I Pn , g2,2{V*{z k )) = I qn 
and ||<7i,2(^*(z fc ))|| 2 = ||<?2,l(^*(^))l| 2 < (Pn + q n ), from (7.9) and Fact 4, we 
have 

nz 

J2\\9(V*(z k ),a*)-g(V*(z k ),a*)\\ 2 

k=l 

= Op{(nh d n )- l {\nn) l l*n 2 z {p n + q n f) 

= Op((nh d n r 1 (lnn) 1 /*), 

which gives (3.21) since \p 2 (^k) ~ P 2 Pn , qn (z k )\ < \\g(V*(z k ), a*) - g(V*(z k ),a*)\\ 
for 1 < k < n z - (3.22) follows from (3.21) and the fact that £fcf i(jfe(*fc) - 
fz{z k )) 2 is Oplnzinh^) -1 ). The proof of Theorem 3.1 is complete. □ 

7.3. Proof of Theorem 3.2. From Lemma 1, the joint distribution of 
V*(z k ) :\<k<n z is the same as that of V*(z k ) + {nh d l CKfz{z k ))~ 1 / 2 {W n ^ jk + 
W nj 2,k) '■ 1 < k < n z, where 

(7.11) ^ ll^n,2, fc || 2 = Op(exp(-(lnn) 1 / 9 )) 
fe=l 



TESTING CONDITIONAL INDEPENDENCE 41 

and Wn i k's are independent symmetric normal matrices of mean zero. To 
describe the covariance structure of each W n ,i, k , let <f>* = ■ ■ ■ , 4>p n ) T > 
ip* = ; . . . ; ip*) T and let Vq be the (p n + q n ) x {p n + q n ) symmetric matrix 
such that <?i,i(Fo) = </>*(X)0*(X) T , g lj2 (V ) = ^{X)r(Y) T and g 2 ,2(V ) = 
V>* (Y)ip* (Y) T . For 1 < k < n z and 1 <m,£<p n + q n , let f7 fe)m ^ and Vb,m,^ 
be the (m,£)th elements of W n> i tk and Vo, respectively, then 

Cov(Uk,m,e,Uk )m >/>) = Cov(Vb jm ^, V^ m \e\Z = z k ) 

for (m,£), (m',if) G :l<i<j<{p n + Qn)}- For 1 < k < n z , let V k = 

V*(z k ) + {nhic K f z (z k ))- l l 2 {W n , ltk + W n , 2 , k ) and 

A 1 (z k ) = g(V k ,a*)g 1A (V k ) 

= gi,2{yk)(92,2{Vk)r l 92,i{Vk) 
-gi,i(Vk)a*(a*fg 1A (V k ), 

and let ^(^fc) be the largest eigenvalue of Ai(z k )(gi t i(V k ))~ 1 , then the joint 
distribution of p 2 (z k ) : 1 < k < nz is the same as that of p\{z k ):\<k<nz- 
For 1 < i, j < 2 and 1 < < n z , let A^ = gij(V k ) - g i) j(y*(z k )), then from 
(7-7), 

n z 2 2 

(7.12) £^^||A M , fe || 2 = Op(Kf l )- 1 (lnn) 1 /8 ) 
fc=i i=i j=i 

and 

Ai{z k ) = gi,2(V*(z k ))(g 2t2 (V k ))- 1 g 2tl (V*(z k )) 

-giAV k )a*(a*) T gi,i(V k ) + gi,2(y*^k))^2,i,k 

(7.13) + A 1>2ik g 2A (V*(z k )) + Ai )2|fc A 2 , ljfc 

-9l,2(V*(z k ))A2,2,kA 2 ,l,k 

- A lt 2, k A2,2, k g2,l(V*(z k )) + i?l,n,fc, 

where 

R\,n,k = Ai } 2, k {g2,2(V k ) 1 - ig„)A 2) l,jfc 

+ 5i,2(^*(z fc ))te, 2 (Vfe)^ 1 - /,„ + A 2i2)fc )A 2i i )fc 
+ Ai^.fc^^C^)" 1 - I qn + A 2 ,2,k)g2,i(V*(z k )). 

To simplify the expression for A\{z k ) in (7.13), we will make use of the 
following properties. 

(C4) The elements of the matrix gi,2(V*(z k )) are zeros except that the 
(1, l)th element is 1. 
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(C5) For £ {(1, 2), (2, 1)}, g, hj {V*{z k )ys first row (or first column) is 
either the first row or the first column of gy j'(Y*(z k )) for (i',f) ^ 
(hj)- 

(C6) The (l,l)th element in g 2 , 2 (V*(z k )) is 1. 

Here (C4) follows from the conditional independence assumption and (3.16), 
and (C5) and (C6) follow from (3.15). From (C6), g 2 , 2 (V k ) can be expressed 
as 

for some matrices B k and D k , so the (l,l)th element of <72 2(^fc) is (1 + 
Bl{D k - B k Bl)~ l B k ). Let J = a*(a*) T , then by (C4) and'(C5), we have 

9i,2(V*(z k ))(g 2 , 2 (V k ))- 1 g 2tl (V*(z k )) = (1 + B%(D k - B^)" 1 B k )J, 

9i,2(V*(z k ))A 2Xk = JA lxk and B^B k J = g h2 (V*(z k ))(A 2Xk ) 2 g 2)1 (V*(z k )), 
so the expression for A\{z k ) in (7.13) becomes 

^((Z? fc -S feJ B^)^ 1 -/, n _ 1 ) J B fe J + 5l ,2(y*(^))(A 2i2ifc ) 2 92 ,i(^(^)) 

- A 1>1>k g lt2 (V*{z k ))g 2tl {V*{z k ))A ljltk + A li2 , fc A 2>lifc 

- gi, 2 (V* (z k ))A 2j2tk A 2tljk - A 1 ^ k A 2 ^ k g 2 ^ 1 (V* {z k )) + i?i, n ,fe- 

Let 

A 2 (z k ) = gy,2{V*{z k )){g 2 ,2{Wi, n , k )fg2,x{V*{z k )) 

-giAWi,n,k)giAv*^k))92Av*^k))9iAWi,n,k) 

+ giA w i,n,k)g 2 ,i (Wi, n> k) - gi, 2 (V* (z k ))g 2t2 (Wi jn ,k)9-2,i(Wi >n ,k) 

- gi,2( w i,n,k)g 2 , 2 ( w i,n,k)g 2 ,i(V*(z k )) 

and 

R 2 ,n,k = B k ({D k — B k B k y l — I qn -i)B k J 

- {nh d n c K h{zk)T X A 2 (z k ) +gi, 2 (V*(z k ))(A 2 ^ k ) 2 g 2A (V*(z k )) 

- A ljltk g 1>2 (V*(z k ))g 2>1 (V*(z k ))A 1>ljk + A lj2)k A 2jltk 

- gi,2(V*(z k ))A 2j2}k A 2tljk - A 2>ltk A 2j2}k g 2> i(V*(z k )), 

then 
where 



"z 



(7.15) J2(\\R^f + IliWII 2 ) = Op ^~ V "^ hi) r ' ) 



exp(-(lnn) 1 /9)(i nn )V8\ 
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from Fact 4, (7.11) and (7.12), and a simple expression for A2(z k ) can be 
obtained as stated below in (C7), which follows from (C4) and (C5). 

(C7) For 1 < k < nz, A2(z k ) = C k C^, where C k is the p n x q n matrix 
obtained by replacing elements in the first row and first column of 
9i,2{W 1>n>k ) with zeros. 

Note that from (C7), we have that 

nz 

Y,\\Mzt)\\ 2 = P (n z (p n - lf{q n - If) = Op(Onn) 1 / 8 ), 
k=l 

which, together with (7.14) and (7.15), implies that 

n z 

(7.16) ^p 1 (z fc )|| 2 =Op((n^)^ 2 (lnn) 1 /8), 
k=l 

and then it follows from (7.16), Fact 4 and (7.12) that 

(7.17) f;p 1 (z fe )( 5l , 1 (Vfc))- 1 -A 1 (z fc )|| 2 = O p ((n^)- 3 (lnn) 1 /4). 
k=l 

For 1 < k < n z , let Xq^ be the largest eigenvalue of A2(z k ) and recall that 
Poizk) is the largest eigenvalue of Ai(zk)(gi,i(V k ))'~ 1 . Then by (7.14), (7.15) 
and (7.17), 

n z 

(7.18) Y,(nhic K f z (z k )pt(z k ) - X ,k) 2 = P (exp(-(lnn) 1 / 9 )(lnn) 1 /8). 

k=l 

Let f k , p{zk) and X k : 1 < k < nz be random variables such that the joint dis- 
tribution of (f k ,p(z k )) : 1 < k < nz is the same as that of (fz(z k ), p( z k)) ■ 1 < 
k < nz, and the joint distribution of (p(z k ),X k ) '■ 1 < k < nz is the same as 
that of (po{zk),Xo,k) ■ 1 < k < nz- Note that from (7.18) and the fact that 

nz 

£ \\A 2 (z k )\\ 2 = P (n z (Pn ~ l) 2 (q n ~ I) 2 ), 
fc=l 

we have that 

n z 

Y J ^Kfz{zk)p 2 {z k ) = y/Op(n 2 z (p n - inq n - l) 2 ) = P ((lnn) 1 /i6 )) 
k=l 

so nhiZZMzk)) 2 = Op((lnn) l / ie ), 

n z n z 

nh d n c K fz{zk){p{z k )) 2 ~ nh d n c K fz{zk){p{z k )) 2 

k=l k=l 
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/ n z \ !/ 2 n z 

< nh d n c K J^(fz(z k ) - fz{z k )f "£(p(z k )f 



\k=l 



k=t 



:Op((lnn) 1 /i6 )(0p(nz(n ^)-i )) i/2 
Op((n/i^)- 1 / 2 (lnn) 3 / 32 ) 



and 



"z 



"z 



h d nCK Y,fk(p(zk)) 2 -Y, x » 

k=l k=l 

<Op((n^)- 1 / 2 (lnn) 3 / 32 ) + 



nh d n c K ^2 fz{zk){p{z k )f - ^2 X k 



k=l 



k=l 



[by (7.18)] < P ((n/i^)- 1 / 2 (lnn) 3 / 32 ) + ^z(Op(exp(-(lnn) 1 /9)(lnn) 1 / 8 )) 1 / 2 

= O P (exp(-0.5(lnn) 1 / 9 )(lnn) 3 / 32 ). 
The proof of Theorem 3.2 is complete. 

7.4. Proof of Corollary 1. To prove Corollary 1, it is sufficient to estab- 
lish (3.25) and (3.26). To see this, let f k , p 2 (z k ) and X k : 1 < k < nz be as in 
Theorem 3.2, then 

nh d n c K YJltx fz(z k )p 2 (z k ) - n z p Pn ,g n 



has the same distribution as 

nhicR Y2=l fkP 2 (z k ) ~ n Z p Pn ,q n 



Pn,q n 



nhjc K J2tiifkp 2 {zk)-J2kii X k YIkii X k-nzH Pr , 



nzo-. 



PnAn 



nzo-, 



Pn,q n 



Suppose that (3.25) holds, then 7 — > almost surely by (3.24) and Theorem 
3.2. Also, (3.26) says that 77 converges to 7V(0, 1) in distribution. Therefore, 
(3.27) holds if (3.25) and (3.26) hold. 

To establish (3.26), we will verify the Lyapounov condition 



"z 



(7.19) 



lim E\X k - p Pn ,q n \ 3 

rwoc^ ( nzCT 2^J3/2 
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and then apply Lindeberg's central limit theorem. Let A be the largest eigen- 
value of CC T . Then A < tr(CC T ), where tr(CC T ) is the trace of CC T , which 
follows the x 2 distribution with degrees of freedom m\^ n = (p n — l)(q n — !)■ 
Therefore, 

EX 3 < E(tv(CC T )) 3 = mi, n (mi >n + 2)(mi, n + 4), 

which implies that E\X\ — ^ Pn ,,j n | 3 = 0{p 3 1 q 3 l ), so (7.19) follows from (3.25) 
and (3.26) holds. 

It remains to prove (3.25). Consider first the case where (i) holds. By 
Theorem 1.1 in Johnstone [7], 

(7.20) — — — converges in distribution as n— too, 
where 

and 

i | i y/ 3 

q n - 2 Pn-lJ 

Here the limiting distribution is the Tracy- Widom law of order 1. Let F 
denote its cumulative distribution function. Suppose that e, ti and are real 
numbers such that t\ < t\ + e < ti — e, which implies that Ffo) > Ffo — e) 
and F(ti + e) > F(h). From (7.20), 

P(Ai > /i n + (t 2 - e)a n ) > 1 - F(t 2 ) 

and 

P(Ai <» n + (h + e)a n )>F(tx), 
if n is large enough. For such n, we have 

2 > min(F(ti), 1 - F(t 2 ))(t 2 - h - 2e) 2 a 2 n 

CF p n ,q n — ^ ' 

which gives (3.25). The proof of (3.25) for the case where (ii) holds can 
be done by reversing the roles of p n and q n . The proof of Corollary 1 is 
complete. 
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